What Forward selection is
In forward selection, a set of predictor variables is considered for inclusion in a regression model one at a time. This process is repeated until the set of variables that gives the best model fit according to a preset criterion is determined. The steps in forward selection are as follows:
-
Start with no predictors in the model.
-
Examine all possible single predictors and select the one with the lowest p-value.
-
Add the predictor with the lowest p-value to the model.
-
Examine all remaining predictors in combination with the predictors already in the model and select the one with the lowest p-value.
-
Add the predictor with the lowest p-value to the model.
-
Repeat Steps 4 and 5 until no remaining predictors have a statistically significant p-value.
-
The final model is the set of predictors that produced the best model fit according to the preset criterion.
Examples
- Forward selection is commonly used to identify the most important predictors in a linear regression model.
- Forward selection can also be used to select a subset of features for a classification problem.
- Forward selection is used to identify the best subset of variables for a regression problem.
- Forward selection is used to reduce the number of predictor variables in a logistic regression model.
- Forward selection is used to identify the most efficient set of variables in an ANOVA model.