🧠 Activation Functions Explained

ReLU (Rectified Linear Unit)

f(x) = max(0, x)
How it works: "If input is positive, pass it through. If negative, output zero."

Think of it like a one-way gate - only positive signals get through!
✅ Pros:
  • Simple & fast
  • No vanishing gradient
  • Works great in practice
❌ Cons:
  • "Dead neurons" problem
  • Not zero-centered
  • Can't learn negative patterns

Sigmoid

f(x) = 1 / (1 + e^(-x))
How it works: "Squashes any input to a value between 0 and 1."

Think of it like a probability converter - turns any number into a percentage!
✅ Pros:
  • Nice probability output
  • Smooth gradient
  • Historical importance
❌ Cons:
  • Vanishing gradient
  • Computationally expensive
  • Not zero-centered

Tanh (Hyperbolic Tangent)

f(x) = (e^x - e^(-x)) / (e^x + e^(-x))
How it works: "Squashes input to values between -1 and 1."

Think of it like a balanced scale - can represent both positive and negative signals!
✅ Pros:
  • Zero-centered
  • Stronger gradients than sigmoid
  • Can represent negative
❌ Cons:
  • Still has vanishing gradient
  • Computationally expensive
  • Saturates at extremes

🎮 Interactive Demo

Move the slider to see how each function transforms the input:

ReLU: 0 | Sigmoid: 0.5 | Tanh: 0

📊 When to Use Each?

Activation Best For Use Case Layer Position
ReLU Hidden layers Image recognition, deep networks First choice for hidden layers
Sigmoid Output layer (binary) Yes/No decisions, probabilities Final layer for classification
Tanh Hidden layers (RNN/LSTM) When you need negative outputs Hidden layers, especially RNNs

🤔 Why Do We Need Activation Functions?

Without activation functions: Neural networks would just be fancy linear equations - they could only learn straight lines!

With activation functions: Networks can learn curves, circles, spirals, and any complex pattern!


Think of it like this:

  • 📏 No activation = Can only draw straight lines
  • 🎨 With activation = Can draw any shape!