Activation Functions Explained

ReLU (Rectified Linear Unit)

f(x) = max(0, x)

How it works: "If input is positive, pass it through. If negative, output zero."

Think of it like a one-way gate - only positive signals get through!

✅ Pros:

Simple & fast
No vanishing gradient
Works great in practice

❌ Cons:

"Dead neurons" problem
Not zero-centered
Can't learn negative patterns

Sigmoid

f(x) = 1 / (1 + e^(-x))

How it works: "Squashes any input to a value between 0 and 1."

Think of it like a probability converter - turns any number into a percentage!

✅ Pros:

Nice probability output
Smooth gradient
Historical importance

❌ Cons:

Vanishing gradient
Computationally expensive
Not zero-centered

Tanh (Hyperbolic Tangent)

f(x) = (e^x - e^(-x)) / (e^x + e^(-x))

How it works: "Squashes input to values between -1 and 1."

Think of it like a balanced scale - can represent both positive and negative signals!

✅ Pros:

Zero-centered
Stronger gradients than sigmoid
Can represent negative

❌ Cons:

Still has vanishing gradient
Computationally expensive
Saturates at extremes

Activation	Best For	Use Case	Layer Position
ReLU	Hidden layers	Image recognition, deep networks	First choice for hidden layers
Sigmoid	Output layer (binary)	Yes/No decisions, probabilities	Final layer for classification
Tanh	Hidden layers (RNN/LSTM)	When you need negative outputs	Hidden layers, especially RNNs

🤔 Why Do We Need Activation Functions?

Without activation functions: Neural networks would just be fancy linear equations - they could only learn straight lines!

With activation functions: Networks can learn curves, circles, spirals, and any complex pattern!

Think of it like this:

📏 No activation = Can only draw straight lines
🎨 With activation = Can draw any shape!

🧠 Activation Functions Explained

ReLU (Rectified Linear Unit)

Sigmoid

Tanh (Hyperbolic Tangent)

🎮 Interactive Demo

📊 When to Use Each?

🤔 Why Do We Need Activation Functions?