The backpropagation algorithm has two main phases- forward and backward phase. The structure of a simple three-layer neural network is shown in Figure 1. Here, every neuron of one layer is connected to all neurons of the next layer but neurons of the same layer are not interconnected. The information flows from the first layer neurons (input layer), via the second layer neurons (hidden layer) to the third layer neurons (output layer).

Let’s consider, the inputs, outputs, the initial weights and biases as:

### Forward Pass

The input layer receives signals and without performing any computation simply transmits the information to the hidden layer. The net input to a neuron of the hidden layer is calculated as the summation of each output of the input layer multiplied by weights (weights are initialized as small random numbers) and an additional bias is incorporated. Then sigmoid activation function is applied to learn complex patterns in the data and to normalize the output of each neuron to a range between 1 and 0. In each successive layer, every neuron sums its inputs and then applies an activation function to compute its output. The output layer of the network then produces the final response, i.e., the predicted value.

### Total Error Calculation

Now, we need to calculate the total error using the mean squared error loss function. Loss function describes how efficient the model performs with respect to the expected outcome.

Consider, loss function= “mean squared error”

The derivative of this loss now needs to be computed with respect to the weights and bias in all layers in the backward phase.

### First Backward Pass

The main goal of the backward phase is to learn the gradient of the loss function with respect to the different weights and bias by using the chain rule of differential calculus. These gradients are used to update the weights and bias. Since these gradients are learned in the backward direction, starting from the output node, this learning process is referred to as the backward propagation.

Bias constant (usually 1) has its own weight for different nodes. The weight of the bias in a layer is updated in the same fashion as all the other weights are updated.

Next, we will continue the backwards pass to update the values of w1, w2, w3, w4 and b1, b2. The gradient with respect to these weights and bias depends on w5 and w8, and we will be using the old values, not the updated ones.

Updated Weights and Bias,

Weights | Bias |
---|---|

w1(new) = 0.149993053 | b1(new) = 0.399861062 |

w2(new) = 0.200002605 | b2(new) = 0.350052104 |

w3(new) = 0.249986106 | b3(new) = 0.248544627 |

w4(new) = 0.30000521 | b4(new) = 0.60039353434 |

w5(new) = 0.399117359 | |

w6(new) = 0.450238667 | |

w7(new) = 0.498139526 | |

w8(new) = 0.550234657 |

### Forward Pass with Updated Weights and Bias

### Error Calculation

After the first round of backpropagation, the total error has decreased to 0.2539 (approximately).

Further, the calculated error value was also validated by building and training an artificial neural network.

### Building an Artificial Neural Network

**Import Libraries**

```
import pandas as pd
import numpy as np
```

**Create the Dataframe**

Input values: x1 = 0.05, x2 = 0.10 and Output values: y1 = 0.01, y2 = 0.99

```
df=pd.DataFrame([[0.05, 0.1, 0.01, 0.99]], columns=['x1', 'x2', 'y1', 'y2'])
df
```

x1 | x2 | y1 | y2 |
---|---|---|---|

0.05 | 0.10 | 0.01 | 0.99 |

```
target=df.iloc[:, 2:]
target=np.array(target)
print('Actual output:',target)
inputs=df.iloc[:, :2]
inputs=np.array(inputs)
print('Input values:', inputs)
```

Actual output: [[0.01 0.99]]

Input values: [[0.05 0.1 ]]

**The network**

```
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
```

**weights and bias initialization**

Initial weights: w1=0.15, w2 =0.20, w3 =0.25, w4 =0.30, w5 =0.40, w6 =0.45, w7 =0.50, w8 =0.55 Initial bias: b1=0.40, b2=0.35, b3=0.25, b4=0.60

```
model = Sequential()
model.add(Dense(units = 2, activation = 'sigmoid', use_bias=True, bias_initializer="ones", weights =[np.array([[0.15, 0.20], [0.25, 0.30]]), np.array([0.40, 0.35])]))
model.add(Dense(units =2, activation = 'sigmoid', use_bias=True, bias_initializer="ones", weights =[np.array([[0.40, 0.45], [0.50, 0.55]]), np.array([0.25, 0.60])]))
```

model.compile(optimizer = 'SGD', loss ='mean_squared_error')

**Fit the model**

`classifier = model.fit(inputs, target, epochs=10)`

**Summary of the network**

`model.summary()`

**Save the model**

`model.save("model.h5")`

**Updated Weigths (w1, w2, w3 & w4) and Bias (b1 & b2)**

**Updated Weigths (w5, w6, w7 & w8) and Bias (b3 & b4)**

Hence, the correctness of the performed manual calculation is validated.

### Reference

1. https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/