Layer Neural Network Design
(This article is not about explaining how the single layer NN works(theory), but the articles main focus is how to design a your own single layer NN)
(All diagrams are freehand, and bear with my hand wrtitings)
In Single layer network we will have one hidden layer, one input layer and one output layer. According to the design we have three input features named as x1, x2 and x3.
According to the network diagram there are three layers in this network (input, hidden and output), however we will not consider the input layer in total layers in design, therefore we have two layers (hidden and output).
In Hidden layers we have 4 neurons and each neuron has the weights for each features, the output of the network is obtained from the output layer, where the output is in the form of predicted probabilities.
This neural network has two phases, the first phase is the forward propagation where we use weights and inputs for calculating the output probabilities and in the second phase is backward propagation where, we calculate the gradients for loss function w.r.t weights and then we update the weight.
Input used for the network has shape of (652,3) i, e 3->features and 652 rows
Forward Propagation Phase:
[The entire design is done based on the shapes of arrays]
Step1: During forward propagation we first pass the inputs to the hidden layer, in this network we have passed each features for each neurons, therefore we have all three features in each neurons. (It should look like this).
In this input we have three features (x1, x2, x3) and 652 instances(rows) therefore input in each neurons will be in the shape of (652,3). As I said earlier, we have weights in each neurons for each features, therefore weight in each neurons will be in the shape of (1,3) here 1 is the unit neuron, since we have 4 neurons in the hidden layer the total weight in the hidden layer will be of (4,3).[For more clear understanding of how the weights are assigned, watch Andrew Ng deep learning classes].
Since, we have weight and input in each neurons we do the forward propagation at first layer (hidden layer).
We can see that in each neuron we do the calculation and resulting values will be in the shape of (652,1), since we have 4 neurons we have output of (652,4), and after passing the output to the sigmoid activation we have the same shape of (652,4).
Here A is a input. After getting the Z we pass that to the sigmoid resulting A.
Step 2: After getting A we pass that to the second layer(output layer) as an input, in this layer we have one neuron, thus there is a weight for each feature, so at this layer weight will be in the shape of (1,4).
After, calculating with the input and weight we have the output in the shape of Z = (652,1), then we pass this Z into the activation function that is sigmoid where we will get the output probabilities in the same shape of (652,1).
After the output, we pass the predicted probabilities into the loss function to calculate the loss, this completes the first phase.
Code for the forward propagation:
Step1: At the beginning we initialize the weight randomly and at the later stage this random weight gets updated
Here we randomly assign the weight for each layer according to the above explanation, here we loop only for the hidden layer and not the output layer because we want the output layer in the form of output weight requirements.
Step 2: We pass the weights into inputs and we calculate the loss.
Function “ForwardPropTrain” gets input value and it assigns it as A later we loop for forward propagation for each layer which uses “ForwardProp” to calculate Z and output from the Z is passed to the sigmoid, later output is saved.
After calculating at the output layer we pass the predicted probabilities into loss function to calculate loss, the “logloss” returns loss and predicted class based on the threshold given here, I have assigned 1 if the probability is greater than 0.5 else 0.
Training of forward propagation is coded like this;
Aim of backward propagation is to calculate the gradients of loss wrt weights so that it can be later used in the weight update.
The neural network can be drawn in the form of flow chart for each process;
In Backward propagation weight is updated to minimize the loss, therefore the gradient is calculated according to the chain rule.
Here are the some calculus I have done for calculating gradients;
Now design for the backward is done according to the following steps:
dLdw2 is calculated based on dLdA2, dA2dZ2 and dZ2dw2, here we can combine dLdA2, dA2dZ2 as dLdZ2 and use the result from the dLdZ derivation and the result of this will be in the shape of (652,1), dZ2dw2 is A which has a shape of (652,4), therefore when we use the values for dLdw2 and we get the dw2 in the shape of (1,4) which has same shape of W.
Now we calculate it for dLdw1:
To calculate the dw1 we use dLdA2, dA2dZ2, dZ2da1, da1dz1, dz1dw1, here we already know that we can combine dLdA2, dA2dZ2 to dLdZ2 which is in the shape of (652,1) and dz2da1 is a W which is in the shape (1,4) and according to the derivation of dadz we calculate da1dz1 thus, the output will be in the shape of (652,4), dz1dw1 is a input A which is (652,3) .
Output for the dLdw1 is shown in image and the shape of dw1 is same as W that is (4,3).
After calculating the gradient we use those gradient for updating the weight;
eta(n) is a learning rate.
Code for the backward propagation is done based on same method as explained above;
After training the network we were able to achieve 75%+ accuracy and loss low as 0.49.
Testing accuracy for each class were as found by creating our own function.
some times predicted class will be same every time so in order to check that biasness we use each class accuracy and it also tell how well our model can predict both classes.
Thus, we have now created our own function for predicting using single layer neural network.
In my future article I will write article on creating your own Perceptron, Adeline Perceptron, MLP neural network.
By creating my own neural network from scratch, I was able to understand the concept more clearly and it also helped me in my problem solving skill.
If you like our articles please do endorse us in LinkedIn, so that it will help us in getting into data scientist as a fresher.
Code for the above is given in our GitHub
Some of the Deep Learning resources are;
Thank you for reading