Here is Moody's ( 2012) take on variable ( feature ) selection in the context of risk analysis ( Methodology for Forecasting and Stress-Testing U.S. Vehicles ABS Deals ):
A key aspect of model development is variable selection-identifying which credit and economic
variables best explain the dynamic behavior of the dependent variable in question. Aligned with principles of modern econometrics, we prefer to choose the variables based on a combination of economic theory or intuition, together with a consideration of the statistical properties of the estimated model.
We believe models built using pure data-mining techniques or principles such as machine learning, though they may fit the existing data well, are more likely to fail in a changing external environment because they lack theoretical underpinnings. The best prediction models employ a combination of statistical rigor with a healthy dose of economic principle. Models built this way enjoy the additional benefit of ease of interpretation.
Adding each economic variable helps the model improve predictive power.Generally speaking, the economic variables should be useful in both producing accurate out-of sample forecasts and providing good in-sample fit. However, we sometimes have to make tradeoff decisions to balance out between these two goals when they are conflicting. If the
discrepancy is unavoidable and very significant, we prioritize forecast accuracy rather than in-sample fit, as forecasts are end results of our models.
Translated: but when the above practice fails - we fudge by taking whatever works better - exactly an approach they ( Moody's ) dismissed earlier.
Here they finally convince us it is actually alchemy approach, based on art and intuition ( which doesn't prevent them from sprinkling some scary looking math - just for the artistic impression:
And here Moody's finally leaves no shade of doubt we are dealing with artists, entertainers and illusionists :
Variable selection is more art than science. The criteria mentioned above are not black or white.
The bottom line is to build a theoretically sound and empirically workable model and get reasonable and
consistent forecasts that are supported by both economic intuition and statistical significance.
To their credit, and unlike many inhouse modeling practices, Moody's actually checks how model performs, but they rarely admit model is wrong:
The consistency check is the comparison of model performance across different production runs. We keep track of the model performance by comparing the forecast statistics over time. The results of the analysis may suggest revisions to the model. However, differences do not necessarily indicate that the model is in error. We should look into what causes the discrepancy and how this affects the end results. If the statistics get really worse and fall into an unacceptable range, we should modify the original model to accommodate revised performance data and changing economic conditions and make sure that the model reflects the most recent development in the auto ABS market.