Neural Network Learning XOR
The XOR Problem
The XOR (exclusive OR) function is a classic problem in neural networks. It outputs 1 when exactly one of its inputs is 1, and 0 otherwise:
Input 1 | Input 2 | XOR Output |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
XOR is not linearly separable, meaning a single neuron cannot solve it. This requires at least one hidden layer, making it a perfect demonstration of neural network capabilities.
Network Architecture
This visualization uses a 3-layer neural network:
- Input Layer: 2 neurons (receiving the two binary inputs)
- Hidden Layer: 4 neurons (with sigmoid activation)
- Output Layer: 1 neuron (with sigmoid activation)
Each connection between neurons has an associated weight, and each neuron in the hidden and output layers has a bias term.
Forward Pass: Making Predictions
During the forward pass, the network calculates a prediction given the inputs:
Mathematical Formulation:
For each neuron in hidden and output layers:
1. Calculate weighted sum plus bias:
z = b + ∑(w_i * a_i)
where:
- z is the neuron's pre-activation
- b is the bias term
- w_i are the weights from previous layer neurons
- a_i are the activations from previous layer neurons
2. Apply sigmoid activation function:
a = σ(z) = 1 / (1 + e^(-z))
where a is the neuron's activation (output)
Step-by-step:
- Input values are set as activations of the input layer neurons
- For each hidden layer neuron, compute the weighted sum of inputs plus bias
- Apply the sigmoid function to get the hidden layer activations
- For the output neuron, compute the weighted sum of hidden layer outputs plus bias
- Apply sigmoid to get the final prediction (between 0 and 1)
Backpropagation: Learning
The neural network learns through a process called backpropagation, which adjusts weights and biases to minimize prediction error:
Mathematical Formulation:
1. Calculate the error:
E = target - prediction
2. For output layer, calculate the delta (error signal):
δ_output = E * σ(z) * (1 - σ(z))
where σ(z) * (1 - σ(z)) is the derivative of the sigmoid function
3. For hidden layer, propagate the delta backward:
δ_hidden = (w_output * δ_output) * σ(z_hidden) * (1 - σ(z_hidden))
4. Update weights and biases:
Δw = learning_rate * δ * activation_input
Δb = learning_rate * δ
5. Apply updates:
w_new = w_old + Δw
b_new = b_old + Δb
Step-by-step:
- Calculate the error between predicted and target output
- Calculate the error gradient for the output layer
- Propagate this error backward to the hidden layer
- Update all weights proportionally to their contribution to the error
- Update all biases based on the error gradient
Training Process
The network is trained through these steps:
- Initialize weights and biases randomly (between -1 and 1)
- For each training example (all 4 XOR cases):
- Perform forward pass to get prediction
- Calculate error
- Perform backpropagation to update weights and biases
- One complete pass through all training examples is called an "epoch"
- Continue training until the average error across all examples is below a threshold (0.05)
Visualization Elements
- Neurons (circles): Brightness indicates activation level (0 to 1)
- Connections (lines): Green indicates positive weights, red indicates negative weights. Line thickness shows the magnitude of the weight.
- Training Stats: Shows current error, epochs completed, and current prediction
- Test Examples: Allow you to see how the network responds to each of the four XOR cases
Why XOR Needs a Hidden Layer
XOR is not linearly separable, which means we cannot draw a straight line to separate the positive cases (0,1 and 1,0) from the negative cases (0,0 and 1,1) in 2D space.
The hidden layer transforms the input space to create a representation where the problem becomes linearly separable. Each hidden neuron essentially learns to create a line, and their combined effect can create complex decision boundaries to solve the XOR problem.