Design Your Own Single Layer Neural Network

Frank Rithesh Pereira Mar 14 2021 · 6 min read
Share this


  • Melroy Pereira  
  • Frank Pereira   
  •                                            Single Layer Neural Network Design

    Single Hidden Layer Neural Network

    (This article is not about explaining how the single layer NN works(theory), but the articles main focus is how to design a your own single layer NN)

    (All diagrams are freehand, and bear with my hand wrtitings)

    In Single layer network we will have one hidden layer, one input layer and one output layer. According to the design we have three input features named as x1, x2 and x3.

    According to the network diagram there are three layers in this network (input, hidden and output), however we will not consider the input layer in total layers in design, therefore we have two layers (hidden and output).

    In Hidden layers we have 4 neurons and each neuron has the weights for each features, the output of the network is obtained from the output layer, where the output is in the form of predicted probabilities.

    This neural network has two phases, the first phase is the forward propagation where we use weights and inputs for calculating the output probabilities and in the second phase is backward propagation where, we calculate the gradients for loss function w.r.t weights and then we update the weight.

    Input used for the network has shape of (652,3) i, e 3->features and 652 rows

    Forward Propagation Phase:

    [The entire design is done based on the shapes of arrays]

    Step1:  During forward propagation we first pass the inputs to the hidden layer, in this network we have passed each features for each neurons, therefore we have all three features in each neurons. (It should look like this).

    First hidden layer pass from input layer

    In this input we have three features (x1, x2, x3) and 652 instances(rows) therefore input in each neurons will be in the shape of (652,3). As I said earlier, we have weights in each neurons for each features, therefore weight in each neurons will be in the shape of (1,3) here 1 is the unit neuron, since we have 4 neurons in the hidden layer the total weight in the hidden layer will be of (4,3).[For more clear understanding of how the weights are assigned, watch  Andrew Ng deep learning classes].

    Since, we have weight and input in each neurons we do the forward propagation at first layer (hidden layer).

    first layer input and weight calculation format

    We can see that in each neuron we do the calculation and resulting values will be in the shape of (652,1), since we have 4 neurons we have output of (652,4), and after passing the output to the sigmoid activation we have the same shape of (652,4).

    Overall first layer calculation

    Here A[0] is a input. After getting the Z[1] we pass that to the sigmoid resulting A[1].

    Output of first layer is passed to sigmoid activation function

    Step 2: After getting A[1] we pass that to the second layer(output layer) as an input, in this layer we have one neuron, thus there is a weight for each feature, so at this layer weight will be in the shape of (1,4).

    After, calculating with the input and weight we have the output in the shape of Z[2] = (652,1), then we pass this Z[2] into the activation function that is sigmoid where we will get the output probabilities in the same shape of (652,1).

    Output layer input and weight calculation

    After the output, we pass the predicted probabilities into the loss function to calculate the loss, this completes the first phase.

    Code for the forward propagation:

    Step1: At the beginning we initialize the weight randomly and at the later stage this random weight gets updated

    Weight Initialization for each layer

    Here we randomly assign the weight for each layer according to the above explanation, here we loop only for the hidden layer and not the output layer because we want the output layer in the form of output weight requirements.

    Step 2: We pass the weights into inputs and we calculate the loss.

    Forward propagation functions

    Function “ForwardPropTrain” gets input value and it assigns it as A[0] later we loop for forward propagation for each layer which uses “ForwardProp” to calculate Z and output from the Z is passed to the sigmoid, later output is saved.

    After calculating at the output layer we pass the predicted probabilities into loss function to calculate loss, the “logloss” returns loss and predicted class based on the threshold given here, I have assigned 1 if the probability is greater than 0.5 else 0.

    Training of forward propagation is coded like this;

    Forward propagation Training

    Backward Propagation:

    Aim of backward propagation is to calculate the gradients of loss wrt weights so that it can be later used in the weight update.

    The neural network can be drawn in the form of flow chart for each process;

    Neural network computation graph

    In Backward propagation weight is updated to minimize the loss, therefore the gradient is calculated according to the chain rule.

    Backward propagation gradients

    Here are the some calculus I have done for calculating gradients;

    logloss gradient wrt activation function
    sigmoid activation function gradient wrt z
    dLdZ calculation

    Now design for the backward is done according to the following steps:

    dLdw2 is calculated based on dLdA2, dA2dZ2 and dZ2dw2, here we can combine dLdA2, dA2dZ2 as dLdZ2 and use the result from the dLdZ derivation and the result of this will be in the shape of (652,1), dZ2dw2 is A[1] which has a shape of (652,4), therefore when we use the values for dLdw2 and we get the dw2 in the shape of (1,4) which has same shape of W[2].

    dLdW2 gradient for W2 update

    Now we calculate it for dLdw1:

    To calculate the dw1 we use dLdA2, dA2dZ2, dZ2da1, da1dz1, dz1dw1, here we already know that we can combine dLdA2, dA2dZ2 to dLdZ2 which is in the shape of (652,1) and dz2da1 is a W[2] which is in the shape (1,4) and according to the derivation of dadz we calculate da1dz1 thus, the output will be in the shape of (652,4), dz1dw1 is a input A[0] which is (652,3) .

    Output for the dLdw1 is shown in image and the shape of dw1 is same as W[1] that is (4,3).

    dLdW1 gradient for W2 update

    After calculating the gradient we use those gradient for updating the weight;

    Weight update using calculated gradients

    eta(n) is a learning rate.

    Code for the backward propagation is done based on same method as explained above;

    Backward propagation gradient function

    After training the network we were able to achieve 75%+ accuracy and loss low as 0.49.

    Training loss vs validation loss
    Training accuracy vs Validation accuracy

    Testing accuracy for each class were as found by  creating our own function.

    some times predicted class will be same every time so in order to check that biasness we use each class accuracy and it also tell how well our model can predict both classes.

    Each class accuracy function

    Thus, we have now created our own function for predicting using single layer neural network.

    In my future article I will write article on creating your own Perceptron, Adeline Perceptron, MLP neural network.

    By creating my own neural network from scratch, I was able to understand the concept more clearly and it also helped me in my problem solving skill. 

    If you like our articles please do endorse us in LinkedIn, so that it will help us in getting into data scientist as a fresher.

    Melroy Pereira, Frank Pereira    

    Code for the above is given in our GitHub

    Melroy Pereira, Frank Pereira          

    Some of the Deep Learning resources are;

  • Andrew Ng Coursera classes    
  • Krish Naik and Sudhanshu Kumar   
  • Sebastian Raschka  
  • Dive into deep learning book[Free]  
  • Thank you for reading

    Read next