Picking the best model
Residual plots What to look at
Mainly to check if the F-statistics and R-squared etc are valid. Aka check if residuals are normal and heteroscedastic etc in case of multiple variables
Normally distributed residuals are equaly distributed around some mean value Heteroscedastic residuals increase over some variable (not even)
Removing variables
Cook’s distance
Samples with large CD are probably outliers. Outliers influence the model too much.
Variance inflation factor (VIC)
1 / 1 - R
Penalty (Shrinkage)
Ridge
RSS plus penalty - parameter lambda