xxxxxxxxxx
- When we do forward propagation, the output always have some sort of
error (the computed output is way off the real output)
- The difference between the real and computed output is error
- Back propagation is a method of updating weights and bias of inputs
through introducing this error from the output to the whole network
so that the error is reduced.
- This is actually another way of applying gradient descent to each
individual neurons so that the weights and bias are
updated for optimized result.
- slope of output node for the last hidden layer nodes = 2 * (actual value - predicted value in output node, that you get using forward propagation) * incoming input node value * slope of activation function, this is 1 for relu
- slope of weight = input node value * slope of output node * activation function slope (1 for relu)
- updated weight = weight or edge - learning rate * slope of weight
- backpropagated value in nodes other than inputs = last layer node * error * previous layers node value * slope of activation function
- It is common to calculate slopes on only a subset of the data (a batch) for computational efficiency
- Use a different batch of data to calculate the next update
- Start over from the beginning once all data is used
- Each time through the training data is called an epoch
- When slopes are calculated on one batch at a time: stochastic gradient descent
- backpropagation takes the prediction error from output layer and propagates it to the input layers through the hidden layers.
- Thus, it allows gradient descent to update all weights in the neural network (chain rule of calculus)
- Slope of node values are the sum of the slopes for all weights that come out of them
- Previous node value = node value * slope
xxxxxxxxxx
import numpy as np
# Define the activation function (sigmoid in this case)
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Define the derivative of the activation function
def sigmoid_derivative(x):
return x * (1 - x)
# Define the neural network architecture
input_layer_size = 2
hidden_layer_size = 4
output_layer_size = 1
# Initialize the weights for each layer
w1 = np.random.uniform(size=(input_layer_size, hidden_layer_size))
w2 = np.random.uniform(size=(hidden_layer_size, output_layer_size))
# Define the input and target output
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])
# Define the learning rate and number of epochs
learning_rate = 0.1
epochs = 10000
# Train the network
for i in range(epochs):
# Forward propagation
layer1_output = sigmoid(np.dot(X, w1))
output = sigmoid(np.dot(layer1_output, w2))
# Backpropagation
error = y - output
output_delta = error * sigmoid_derivative(output)
layer1_error = output_delta.dot(w2.T)
layer1_delta = layer1_error * sigmoid_derivative(layer1_output)
# Update weights
w2 += layer1_output.T.dot(output_delta) * learning_rate
w1 += X.T.dot(layer1_delta) * learning_rate
# Test the network
test_input = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
test_output = sigmoid(np.dot(sigmoid(np.dot(test_input, w1)), w2))
print(test_output)