SAN

Picking the best model

Residual plots What to look at

Mainly to check if the F-statistics and R-squared etc are valid. Aka check if residuals are normal and heteroscedastic etc in case of multiple variables

Normally distributed residuals are equaly distributed around some mean value Heteroscedastic residuals increase over some variable (not even)

Removing variables

Cook’s distance

Samples with large CD are probably outliers. Outliers influence the model too much.

Variance inflation factor (VIC)

1 / 1 - R

Penalty (Shrinkage)

Ridge

RSS plus penalty - parameter lambda

Lasso