In this article, we will see what are Convolutional Neural Networks, ConvNets in short.
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery.
In CNN input is an image or more specifically we can say that it is a 3D matrix.
Now let's see the Convolutional Neural Network
We can say Convolutional Neural Network has usually 3 layers:-
Now let's look at each one of them in details-
Convolutional Layer is the first layer in a CNN.
It gets as input a matrix of the dimensions [h1 * w1 * d1], which is the blue matrix in the above image.
Next, we have kernels (filters).
A kernel is a matrix with the dimensions [h2 * w2 * d1], which is one yellow cuboid of the multiple cuboids (kernels) stacked on top of each other (in the kernels layer) in the above image.
For each convolutional layer, there are multiple kernels stacked on top of each other, which is of dimensions [h2 * w2 * d2], where d2 is the number of kernels.
For each kernel, we have its respective bias, which is a scalar quantity.
And then, we have an output for this layer, the green matrix which has dimensions [h3 * w3 * d2].
Alright, so we have inputs, kernels, and outputs. Now let’s look at what happens with a 2D input and a 2D kernel
First, we need to agree on a few parameters that define a convolutional layer.
For each position of the kernel on the image, each number on the kernel gets multiplied with the corresponding number on the input matrix (blue matrix) and then they all are summed up for the value in the corresponding position in the output matrix (green matrix).
With d1 > 1, the same thing occurs for each of the channels and then they are added up together and then summed up with the bias of the respective filter and this forms the value in the corresponding position of the output matrix.
There are many types of pooling and basic one of them are:
The main purpose of a pooling layer is to reduce the number of parameters of the input tensor and thus
- Helps reduce overfitting
- Extract representative features from the input tensor
- Reduces computation and thus aids efficiency
The input to the Pooling layer is tensor.
In the case of Max Pooling, a kernel of size
n*n (2x2 in the above example) is moved across the matrix and for each position, the max value is taken and put in the corresponding position of the output matrix.
In case of Average Pooling, a kernel of size
n*n is moved across the matrix and for each position the average is taken of all the values and put in the corresponding position of the output matrix.
This is repeated for each channel in the input tensor. And so we get the output tensor.
Pooling downsamples the image in its height and width but the number of channels(depth) stays the same.
After the above operation then matrices get flattened to pass in a fully connected layer.
What is Flattened?
The output from the final (and any) Pooling and Convolutional Layer is a 3-dimensional matrix, to flatten that is to unroll all its values into a vector.
Fully Connected Layer
The fully Connected Layer is simply, feed-forward neural network. Fully Connected Layers form the last few layers in the network. The fully Connected Layer also contains a hidden layer.
The input to the fully connected layer is the output from the final Pooling or Convolutional Layer, which is flattened and then fed into the fully connected layer.
After passing through the fully connected layers, the final layer uses the softmax activation function (instead of ReLU) which is used to get probabilities of the input being in a particular class (classification).
And so finally, we have the probabilities of the object in the image belonging to the different classes!!
And that is how the Convolutional Neural Network works!!
And input images get classified as labels!!
There are several architectures in the field of Convolutional Networks that have a name. The most common are:
If you you have any concern or wanna contact me, you can comment down below or you contact me on LinkedIn