### What is an activation function?

In simplest terms, activation function is just a mathematical function which receives input, does some predefined mathematical operations on it and produces resultant as an output. The term ‘activation’ comes from the fact that output of these functions defines whether a neuron is active or not. Activation function is also used for normalization, regularization of data and introducing non-linearization in neural network.

Following are some important activation functions every data scientist must be aware of:

### 1. Sigmoid function

The Sigmoid function has a characteristic S-shaped curve, it is bounded and has a non-negative derivative at each point with exactly one inflection point.

In the image above, the red curve is of a sigmoid function and the green curve is its derivative.

### Mathematical function:

From function above, it is evident that as exp(-x) can never be negative, which means the denominator will always be greater than 1. Hence, the value of function f(x) is always positive but less than 1.

### Acceptable input:

A real number ranging from -inf to +inf.

### Output range:

As x tends to -inf, output tends to 0 and as x tends to +inf, output tends to 1.

### Derivative:

A derivative of the sigmoid function is smooth, uniform across the y-axis, and always positive. in terms of sigmoid function itself, it is given as follows

### Pros of Sigmoid Function:

### Cons of Sigmoid Function:

It is general misconception that sigmoid is probability function, but it is only probability like function (the sum outputs for all the inputs is not necessarily 1).

### 2. Tanh/Hyperbolic tangent

It is similar to a sigmoid function, just it normalizes the data between (-1, 1) instead of (0, 1).

In the image above, the red curve is of a tanh function and the green curve is its derivative.

### Mathematical function:

### Acceptable input:

A real number ranging from -inf to +inf.

### Output range:

As x tends to -inf, output tends to -1 and as x tends to +inf, output tends to 1.

### Derivative:

### Pros of tanh function:

### Cons of tanh function:

### 3. ReLU : Rectified Linear Unit

In the normalization process of sigmoid and tanh function, they tend to loose some information related to magnitude of variables to tackle this problem, ReLU was discovered.

### Mathematical function:

### Acceptable input:

Real number ranging from -inf to +inf.

### Output range:

For all negative number, output is zero and for positive numbers, output is same number

### Derivative:

For all negative numbers, derivative is 0 and for positive numbers, derivative is 1. It is a step function as shown in figure below.

### Pros of ReLU function:

### Cons of ReLU function:

There are many variants of ReLU, some of them are discussed below

### 4. Leaky ReLU : Leaky Rectified Linear Unit

Instead of discarding negative inputs altogether, leaky ReLU provides a small output for them too.

### Mathematical function:

For negative number instead of zero, leaky ReLU gives output that is 0.01 times input and for positive number, it gives output as same as input.

### Acceptable input:

Real number ranging from -inf to +inf.

### Output range:

### Derivative:

For all negative numbers, derivative is 0.01 and for positive numbers, derivative is 1. It is a step function as shown in figure below.

### Pros of Leaky ReLU:

### Cons of Leaky ReLU:

### 5. P-ReLU : Parametric Rectified Linear Unit

For negative inputs, instead of 0.01 factor, other parameter is used. It is also called as randomized ReLU.

### Mathematical function:

for a = 0, it is ReLU

for a =0.01, it is leaky ReLU

“a” is learnable parameter

### Acceptable input:

Real number ranging from -inf to +inf.

### Output range:

### Derivative:

It is similar to Leaky ReLU, just the slope of gradient for negative inputs changes w.r.t. value of ‘a’.

### Pros of P-ReLU:

### Cons of P-ReLU:

### 6. ELU : Exponential Linear Unit

ReLU, Leaky ReLU, P-ReLU have sharp corner on curve at zero. To get a smoother curve around zero, ELU comes handy.

### Mathematical function:

### Acceptable input:

Real number ranging from -inf to +inf.

### Output range:

### Derivative:

### Pros of ELU:

### Cons of ELU:

### 7. Softplus

It is activation function whose graph is similar to a ReLU function but smooth throughout. In figure below, the curve in red color is of softplus and blue one is ReLU.

### Mathematical function:

### Acceptable input:

Real number ranging from -inf to +inf.

### Output range:

### Derivative:

Derivative of softplus function is sigmoid function.

### Pros of Softplus function:

### Cons of Softplus function:

### 8. Swish function

This function is suggested by Google Brain team. It is non-monotonic, smooth and self gated function.

### Mathematical function:

### Acceptable input:

Real number ranging from -inf to +inf.

### Output range:

### Derivative:

### Pros of Swish function:

### Cons of Swish function:

### 9. Maxout function

Name of the maxout function is very intuitive, ‘max+out’, and that is exactly what it does! It selects the input which is maximum and gives it as an output. It is a learnable activation function.

### Mathematical function:

### Acceptable input:

Real number ranging from -inf to +inf.

### Output range:

### Derivative:

### Pros of Maxout function:

### Cons of Maxout function:

### 10. Softmax function

Softmax function is probabilistic function used in outer layer for multi-class classification problems. the sum of output for each input is 1 in this case.

### Mathematical function:

### Acceptable input:

Real number ranging from -inf to +inf.

### Output range:

As it gives probabilistic output, it is always in range of 0 to 1.

### Derivative:

### Pros of Softmax function:

### Cons of Softmax function:

*** * ***

Thanks for reading the article! Wanna connect with me?

Here is a link to my Linkedin Profile