*Back propagation is used by Optimization algorithms to adjust w and b.*

At the end of a forward propagation (see my previous post), output layer results in a predicted value, we compare this predicted value with the corresponding actual value from our training set to figure out the difference, also referred to as cost function. Cost function measures how weight w and bias b are doing on a training item to come up with a good prediction.

If the cost function is high, it means network predicted a value that is far from the actual value. For example actual value is 6, network predicted 2.

If the cost function is low, it means network predicted a value that is close to the actual value. For example actual value is 6, network predicted 5.

So the goal is to minimize the cost function. Weight w and bias b impact how close or far prediction is from the actual value. Optimization algorithms like Gradient Descent, Adam etc., update w and b to minimize the cost function.

Back propagation figures out, impact on cost function (sensitivity) , in relation to w and b, but it does not update w and b. Optimization algorithms like Gradient descent determine how much to change and update w and b based on the sensitivity.** **

For example in a simple 2 layered neural network, back propagation determines that increasing w in layer1, from 1 to 2, increases the cost function from 3 to 6. This means if you increase w by one unit, cost function goes up by 3 times the change. In other words, 3 is the derivative of the cost function with respect to w. Similarly back propagation calculates derivate of b. Gradient descent uses these derivatives to update w and b in order to minimize the cost function.