ReLU
Rectified Linear unit is the most basic activation function. For each element in the input vector, it returns 0 if the value is negative, otherwise it returns the value itself.
The forward message can be rewritten as The python code in numpy can therefore be either
def forward(self, X):
self.X = X
return X * (X > 0)or
def forward(self, X):
self.X = X
return (np.abs(X) + X) / 2However, first version will almost surely result in less operations, though it’s not as fancy. Notice that we need to store the input matrix X for the backward message. Or better, store only the boolean mask (X > 0).
For the backward message, activation function doesn’t have any parameters, but we still need a backward message for previous layers. For a single sample input, the derivative with respect to is given by The full chain rule results in
def backward(self, dY):
return dY * (self.X > 0)In all the cases, the x > 0 is performed element-wise. So is the * operator. Beware that at 0, the gradient technically doesn’t exists. However, it is standard to set it to 0(as this implementation does).