Given a training set , learn a predictor that minimizes the expected risk.
However, we only have the empirical risk. So we gather enough samples, and find a classifier, that minimizes empirical risk.
Number of samples needed for a given loss function and training set can be estimated using Hoeffding inequality.
However, ERM does not tell us if the used method for a given problem is correct. The theory only gives us hope, that using empirical risk as a substitute for true risk is enough. However, if the true risk is high (= method is not suitable), we can train how much we want, but we’ll never train a good predictor by simply training ERM. Overfitting = empirical risk is much lower than true risk.
Takeaway - domain knowledge is important when selecting a learner.