A nonlinear neuron means that it is capable of responding to the nonlinearities that may be present in the data. Nonlinearity in this context essentially means that for a given input, the output does not change in a linear way. Look at the following diagrams:
Both of the preceding figures depict the relationship between the inputs that are given to a neural network and the outputs that the network produces. From the first figure, it is clear that the input data is linearly separable, whereas the second figure tells us that the inputs cannot be linearly separated. In cases like this, a linear neuron will miserably fail, hence the need for nonlinear neurons.
Behind the operation of a neuron sits a function. In the case of the linear neuron, we saw that its operations were based on a step function. We have a bunch of functions that are capable of capturing the nonlinearities. The sigmoid function is such a function, and the neurons that use this function are often called sigmoid neurons. Unlike the step function, the output in the case of a sigmoid neuron is produced using the following rule:
So, our final, updated rule becomes the following:
But why is the sigmoid function better than a step function in terms of capturing nonlinearities? Let's compare their performance in graphical to understand this:
The preceding two figures give us a clear picture about the two functions regarding their intrinsic nature. It is absolutely clear that the sigmoid function is more sensitive to the nonlinearities than the step function.
Apart from the sigmoid function, the following are some widely known and used functions that are used to give a neuron a nonlinear character:
- Leaky ReLU
In the literature, these functions, along with the two that we have just studied, are called activation functions. Currently, ReLU and its variants are by far the most successful activation functions.
We are still left with a few other basic things related to artificial neural networks. Let's summarize what we have learned so far:
- Neurons and their two main types
- Activation functions
We are now in a position to draw a line between MLPs and neural networks. Michael Nielson in his online book Neural Networks and Deep Learning describes this quite well:
We are going to use the neural network and deep neural network terminologies throughout this book. We will now move forward and learn more about the input and output layers of a neural network.