ReLU

Rectified Linear unit is the most basic activation function. For each element in the input vector, it returns 0 if the value is negative, otherwise it returns the value itself.

The forward message can be rewritten as The python code in numpy can therefore be either

def forward(self, X):
	self.X = X
	return X * (X > 0)

or

def forward(self, X):
	self.X = X
	return (np.abs(X) + X) / 2

However, first version will almost surely result in less operations, though it’s not as fancy. Notice that we need to store the input matrix X for the backward message. Or better, store only the boolean mask (X > 0).

For the backward message, activation function doesn’t have any parameters, but we still need a backward message for previous layers. For a single sample input, the derivative with respect to is given by The full chain rule results in

def backward(self, dY):
	return dY * (self.X > 0)

In all the cases, the x > 0 is performed element-wise. So is the * operator. Beware that at 0, the gradient technically doesn’t exists. However, it is standard to set it to 0(as this implementation does).