Skip to content

Understanding Backpropagation Through Computational Graphs

Backpropagation is the cornerstone algorithm for training neural networks. This interactive visualization demonstrates how gradients flow backward through a computational graph using the chain rule.

Interactive Demo

This example computes L = (a × b + c) × f with values a=2, b=3, c=-10, f=5.

Step 1 / 4

Forward Pass: Initial values: a=2, b=3, c=-10, f=5

a2b3c-10f5edL

Forward Pass

Computes values from inputs to output. Each node calculates its value based on its inputs.

Backward Pass

Computes gradients from output to inputs using the chain rule. Shows how much each input affects the output.

How It Works

Forward Pass

The forward pass computes values from inputs to output in topological order:

  1. e = a × b = 2 × 3 = 6
  2. d = e + c = 6 + (-10) = -4
  3. L = d × f = (-4) × 5 = -20

Backward Pass (Backpropagation)

The backward pass computes gradients from output to inputs using the chain rule:

  1. Initialize: ∂L/∂L = 1 (gradient of output w.r.t. itself)
  2. Gradient at d: ∂L/∂d = ∂L/∂L × ∂L/∂d = 1 × f = 5
    Since L = d × f, the derivative ∂L/∂d = f
  3. Gradient at e: ∂L/∂e = ∂L/∂d × ∂d/∂e = 5 × 1 = 5
    Since d = e + c, the derivative ∂d/∂e = 1
  4. Gradient at a: ∂L/∂a = ∂L/∂e × ∂e/∂a = 5 × b = 5 × 3 = 15
    Since e = a × b, the derivative ∂e/∂a = b
  5. Gradient at b: ∂L/∂b = ∂L/∂e × ∂e/∂b = 5 × a = 5 × 2 = 10
    Since e = a × b, the derivative ∂e/∂b = a

The Chain Rule

The chain rule is the mathematical foundation of backpropagation. For a function composed as y = g(f(x)), the chain rule states:

dy/dx = (dy/du) × (du/dx)

where u = f(x). In our computational graph, we apply this rule repeatedly to compute how the output L changes with respect to each input variable.

Local Derivatives

Each operation in the graph has local derivatives that describe how its output changes with respect to its inputs:

  • Multiplication: For c = a × b, we have ∂c/∂a = b and ∂c/∂b = a
  • Addition: For c = a + b, we have ∂c/∂a = 1 and ∂c/∂b = 1

These local derivatives are combined via the chain rule during backpropagation to compute the gradients of the output with respect to all inputs.

Why This Matters

In neural networks, the computational graph is much larger (with thousands to billions of nodes), but the principle remains the same. Backpropagation efficiently computes gradients for all parameters in a single backward pass, enabling gradient descent optimization to train the network.

The gradients tell us how to adjust each parameter to reduce the loss function. A large gradient (like ∂L/∂a = 15) means that variable has a strong influence on the output, while a small gradient means less influence.

Key Takeaways

  • Forward pass: Compute values from inputs to output
  • Backward pass: Compute gradients from output to inputs
  • Chain rule: Multiply gradients along each path
  • Local derivatives: Simple rules for each operation type
  • Efficiency: One backward pass computes all gradients
  • Applications: Training neural networks via gradient descent