Neural Network Learning XOR

Neural Network Visualization
0.000.000.000.000.000.000.00Input LayerHidden LayerOutput Layer
Training Stats
Epochs:0
Current Error:1.0000
Status:Ready
Current Example:[0, 0] → 0
Prediction:0.0000
SlowFast
Test Network
How Neural Networks Work

The XOR Problem

The XOR (exclusive OR) function is a classic problem in neural networks. It outputs 1 when exactly one of its inputs is 1, and 0 otherwise:

Input 1Input 2XOR Output
000
011
101
110

XOR is not linearly separable, meaning a single neuron cannot solve it. This requires at least one hidden layer, making it a perfect demonstration of neural network capabilities.

Network Architecture

This visualization uses a 3-layer neural network:

  • Input Layer: 2 neurons (receiving the two binary inputs)
  • Hidden Layer: 4 neurons (with sigmoid activation)
  • Output Layer: 1 neuron (with sigmoid activation)

Each connection between neurons has an associated weight, and each neuron in the hidden and output layers has a bias term.

Forward Pass: Making Predictions

During the forward pass, the network calculates a prediction given the inputs:

Mathematical Formulation:

For each neuron in hidden and output layers:

1. Calculate weighted sum plus bias:

z = b + ∑(w_i * a_i)

where:

  • z is the neuron's pre-activation
  • b is the bias term
  • w_i are the weights from previous layer neurons
  • a_i are the activations from previous layer neurons

2. Apply sigmoid activation function:

a = σ(z) = 1 / (1 + e^(-z))

where a is the neuron's activation (output)

Step-by-step:

  1. Input values are set as activations of the input layer neurons
  2. For each hidden layer neuron, compute the weighted sum of inputs plus bias
  3. Apply the sigmoid function to get the hidden layer activations
  4. For the output neuron, compute the weighted sum of hidden layer outputs plus bias
  5. Apply sigmoid to get the final prediction (between 0 and 1)

Backpropagation: Learning

The neural network learns through a process called backpropagation, which adjusts weights and biases to minimize prediction error:

Mathematical Formulation:

1. Calculate the error:

E = target - prediction

2. For output layer, calculate the delta (error signal):

δ_output = E * σ(z) * (1 - σ(z))

where σ(z) * (1 - σ(z)) is the derivative of the sigmoid function

3. For hidden layer, propagate the delta backward:

δ_hidden = (w_output * δ_output) * σ(z_hidden) * (1 - σ(z_hidden))

4. Update weights and biases:

Δw = learning_rate * δ * activation_input

Δb = learning_rate * δ

5. Apply updates:

w_new = w_old + Δw

b_new = b_old + Δb

Step-by-step:

  1. Calculate the error between predicted and target output
  2. Calculate the error gradient for the output layer
  3. Propagate this error backward to the hidden layer
  4. Update all weights proportionally to their contribution to the error
  5. Update all biases based on the error gradient

Training Process

The network is trained through these steps:

  1. Initialize weights and biases randomly (between -1 and 1)
  2. For each training example (all 4 XOR cases):
    • Perform forward pass to get prediction
    • Calculate error
    • Perform backpropagation to update weights and biases
  3. One complete pass through all training examples is called an "epoch"
  4. Continue training until the average error across all examples is below a threshold (0.05)

Visualization Elements

  • Neurons (circles): Brightness indicates activation level (0 to 1)
  • Connections (lines): Green indicates positive weights, red indicates negative weights. Line thickness shows the magnitude of the weight.
  • Training Stats: Shows current error, epochs completed, and current prediction
  • Test Examples: Allow you to see how the network responds to each of the four XOR cases

Why XOR Needs a Hidden Layer

XOR is not linearly separable, which means we cannot draw a straight line to separate the positive cases (0,1 and 1,0) from the negative cases (0,0 and 1,1) in 2D space.

The hidden layer transforms the input space to create a representation where the problem becomes linearly separable. Each hidden neuron essentially learns to create a line, and their combined effect can create complex decision boundaries to solve the XOR problem.