Before getting into the world of the Convolutional neural network, Let us talk a little about the Artificial Neural Network, So whenever the word neuron strikes in our mind the first thing that we think is about the biological neurons in our human brain, So how does this biological neuron work?
It is simple, it just takes information from all the senses then the axon transmits the signal from one neuron to the others. At the end of the axon, the contact to the dendrites is made through the synapses, this is how biological neuron work.
Artificial Neural Network
The artificial neuron network mimics the human brain, in the above diagram you can see that it works in a similar way to the biological neuron, where inputs in1,in2 up to nth input is passed into the network, after which summation of weight, bias is added to it with an activation function deciding whether a neuron should be fired or not. The amazing thing about a neural network is that you don't have to program it to learn explicitly. Neural networks learn all by themselves, just like the human brain.
A typical neural network has anything from a few dozen to hundreds, thousands or even millions of artificial neurons called units arranged in a series of layers, each of which connects to the layers on either side. Some of them, known as input units, are designed to receive various forms of information from the outside world that the network will attempt to learn about, recognize, or otherwise process. Other units sit on the opposite side of the network and signal how it responds to the information it's learned; those are known as output units. In between the input units and output, units are one or more layers of hidden units, which, together, form the majority of the artificial brain. Most neural networks are fully connected, which means each hidden unit and each output unit is connected to every unit in the layers on either side. The connections between one unit and another are represented by a number called weight, which can be either positive (if one unit excites another) or negative (if one unit suppresses or inhibits another). The higher the weight, the more influence one unit has on another.
CONVOLUTIONAL NEURAL NETWORK
INTRODUCTION TO CONVOLUTIONAL NEURAL NETWORK
What is Convolutional Neural Network?
Convolutional neural network (CNN), a class under an artificial neural network that has become dominant in the field of image classification, computer vision, and also attracting interests across various domains. CNN is designed to automatically and adaptively learn spatial hierarchies of features (horizontal, vertical, lines, etc.) through backpropagation by using multiple building blocks, such as convolution layers, pooling layers, and fully connected layers.
Now let us talk about the layers in the CNN architecture:
3)Fully Connected Layer (FC layer)
The convolutional neural network is the first layer in the CNN architecture, As in the above image, we can see that input to the convolutional layer passed is an image, which will have a 3D structure (i.e RGB color), which is the blue matrix in the image above.
Next, we have kernels which are also called has filters, Filters are a matrix of numbers that contains multiple kernels which are used to extract various spatial features from the image, for example, let us take the example of a flower, where it starts extracting the round shape, straight line, vertical line, oval shape and many more, this is the job of the Kernels (Filters) in the convolutional network. For each convolutional network, we will have multiple kernels stacked on top of each other, and for each kernel, we have a respective bias which is a scalar quantity. Then as you can see we have an output for this layer that is the green matrix.
Now although we have the input and output, let us see how is the internal working happens in the convolutional layer.
In the above image, we can see that a kernel is been moving on top of the image matrix, So here comes a question, How does the output come after the kernel is applied on top of the image?
For each position of the kernel on the image, each number on the kernel gets multiplied with the corresponding number on the input matrix (blue matrix) and then they all are summed up for the value in the corresponding position in the output matrix (green matrix).
The same thing occurs for each of the channels and then they are added up together and then summed up with the bias of the respective filter and this forms the value in the corresponding position of the output matrix. Let’s see the visualization of it. Here in CNN come up with something called a stride and padding here, now what is stride and Padding?
Stride is a parameter in the CNN that decides by how much the Kernel has to move over the image matrix, for example, if stride=1, it moves one step forward leaving the first column. These entire steps are repeated by moving the kernels on top of the image, based on the number of kernels present.
Padding is also a parameter present in the CNN architecture, where when we apply a kernel upon an image, a situation occurs wherein the size of the image when passed reduces when the output is obtained, this problem might lead to losing important spatial information from the image, this is where the padding concept comes into the picture, padding as you can see in the below image a number 0 is added in the sides of the matrix so that when we get the output we will obtain the same image size as the input. we have two kinds of padding
2)Near value padding
The output that we obtain from the convolutional layer is Feature Maps. That is the green matrix in the below diagram.
Here comes an end to the first layer of the CNN architecture that is the convolutional layer.
The second layer in the CNN architecture is the pooling layer, now let us see what happens in the pooling layer, The pooling layer in the CNN will help to reduce the dimensions of the feature maps Thus, it reduces the number of parameters to learn, a reduces the amount of computational power of the network, reduces overfitting and extracts only important feature required as input to the next layer.
We have different kinds of pooling layer:
FULLY CONNECTED LAYER
A Convolutional network can be broken up into two parts-
The fully-connected layer is always the last layer of a neural network. The classification part is done by the fully connected layer of CNN. The fully connected layers of CNN provide it the ability to mix signals of information between each input dimension and each output class and therefore the decision can be based on the whole image and a class can be assigned to it.
The input to the fully connected layer(FC) comes either from the pooling layer or the convolutional layer and produces a new output vector. To do this, it applies a linear combination and then applies an activation function to the input values received.
The input to the fully connected layer is flattened and then sent to the FC layer, and backpropagation applied to every iteration of training. As you can see in the image below after the model runs for series of epochs, the model will be able to differentiate which class an image will belong to by differentiating the dominating and low-level features, then finally classify them using the softmax activation function.
There are various architectures of CNNs available, That will help build strong AI models and power the future, some of them are given below
CNN has gained a lot of popularity in the past years with its various architectures, be it in object detection, image classification, and radiology.
In the next post, I would like to talk about some popular CNN architectures such as AlexNet, VGGNet, GoogLeNet, and ResNet.
Article by: Ashitha A Nair
PGDDS: Manipal Academy of Higher Education (MAHE)