A neuron is the most granular unit in a neural network. Let's look at the second word of "neural network." A network is nothing but a set of vertices (also called nodes) whose edges are connected to each other. In the case of a neural network, neurons serve as the nodes. Let's consider the following neural network architecture and try to dissect it piece by piece:
What we can see in the preceding diagram is a neural network with two hidden layers (in a neural network, a layer is a set of neurons) with a single output. In fact, this is called a two-layer neural network. The neural network consists of the following:
- One single input
- Two hidden layers, where the first hidden layer has three neurons and the second hidden layer contains two neurons
- One single output
There is no deeper psychological significance in calling the layers hidden they are called hidden simply because the neurons involved in these layers are neither parts of the input nor output. One thing that is very evident here is that there is a layer before the first hidden layer. Why are we not counting that layer? In the world of neural networks, that initial layer and output are not counted in the stack of layers. In simple words, if there are n hidden layers, it is an n-layer neural network.
The initial layer (also called an input layer) is used for receiving primary input to the neural network. After receiving the primary input, the neurons present in the input layer pass them to the next set of neurons that are present in the subsequent hidden layers. Before this propagation happens, the neurons add weights to the inputs and a bias term to the inputs. These inputs can be from various domains—for example, the inputs can be the raw pixels of an image, the frequencies of an audio signal, a collection of words, and so on. Generally, these inputs are given as feature vectors to the neural network. In this case, the input data has only one feature.
Now, what are the neurons from the next two layers doing here? This is an important question. We can consider the addition of weights and biases to the inputs as the first level/layer of learning (also called the decision making layer). The neurons in the initial hidden layer repeat this process, but before sending the calculated output to the neurons that are present in the next hidden layer, they compare this value to a threshold. If the threshold criteria are satisfied, then only the outputs are propagated to the next level. This part of the whole neural network learning process bears a solid resemblance to the biological process that we discussed earlier. This also supports the philosophy of learning complex things in a layered fashion.
A question that is raised here is, "What happens if no hidden layers are used?". It turns out that adding more levels of complexity (by adding more layers) in a neural network allows it to learn the underlying representations of the input data in a more concise manner than a network with just the input layer and the output. But how many layers would we need? We will get to that later.
Let's introduce some mathematical formulas here to formalize what we just studied.
We express the input features as x, the weights as w, and the bias term as b. The neural network model that we are currently trying to dissect builds upon the following rule:
The rule says that after calculating the sum of weighted input and the bias, if the result is greater than 0, then the neuron is going to yield 1, and if the result is less than or equal to 0, then the neuron is simply going to produce 0 in other words, the neuron is not going to fire. In the case of multiple input features, the rule remains exactly the same and the multivariate version of the rule looks like the following:
Here, i means that we have a total of i input features. The preceding rule can be broken down as follows:
- We take the features individually, and then we multiply them by the weights
- After finishing this process for all the individual input features, we take all of the weighted inputs and sum them and finally add the bias term.
The elements we just studied were proposed by Frank Rosenblatt in the 1960s. The idea of assigning 0 or 1 to the weighted sum of the inputs based on a certain threshold is also known as the step-function. There are many rules like this in the literature, these are called update rules.
The neurons we studied are linear neurons that are capable of learning linear functions. They are not suited for learning representations that are nonlinear in nature. Practically, almost all the inputs that neural networks are fed with are nonlinear in nature. In the next section, we are going to introduce another type of neuron that is capable of capturing the nonlinearities that may be present in the data.