Mean square error is defined as
where both and is a 1D array of values. The loss is always positive, the larger the difference from truth, the bigger the penalty is and it’s easily differentiable.
We are interested in the derivative given the prediction . The derivative for the -th label is
Therefore, the gradient is
Some people like to omit the . While optimizing, the factor doesn’t really matter - meaning the optimum stays the same.
Implementation
Let’s jump directly into the code
import numpy as np
class MSELoss():
def forward(self, y, y_pred):
assert len(y.shape) == 1 and len(y_pred.shape) == 1, "Not a 1D array."
assert y.shape == y_pred.shape, "Dimension mismatch"
return np.mean(np.power(y - y_pred, 2))
def backward(self, y, y_pred):
assert len(y.shape) == 1 and len(y_pred.shape) == 1, "Not a 1D array."
assert y.shape == y_pred.shape, "Dimension mismatch"
n = y.shape[0]
return (2.0 / n) * (y_pred - y)