Predictive Modeling Tutorial - Appendix: Variable Transformation Techniques

Created by Steve Hoover, Modified on Fri, Aug 16 at 4:28 PM by Steve Hoover

Box-Cox transform the predictors

In predictive models, the distribution of some variables might be highly skewed. Typically, the number of past customers' transactions or past purchases will be skewed: Many customers have made just 1 purchase in the past, but many others have made approximately 10 purchases, and a handful have made 100 purchases or more. The same problem will often happen with purchase amounts, income, etc.


Since many predictive models (linear and logistic regressions) work best when predictors and target variables follow a more Normal-like distribution, the Box-Cox transformation will re-compute skewed variables so that they become more balanced. 


A Box-Cox transformation will automatically transform a variable X into a new variable Y. Even though there is an assignment form every X to Y (i.e., X -> Y), the same may not be true for Y -> X. For this reason, while a Box-Cox transform can be applied to predictors, it cannot be applied to the target variable. In the case of target variables, only log-transforms are available.


Log transform the target variable

When using a Continuous or Discrete-continuous target variable. The log transformation can be used to make highly skewed distributions less skewed. This can be valuable for making patterns in the data more interpretable.


Cross-validation

Cross-validation is a technique to evaluate predictive models by partitioning the original sample into a training set to train the model, and a test set to evaluate it. In k-fold cross-validation, the original sample is randomly partitioned into k equal size subsamples. In case of a 10-fold cross-validation, for instance, the model is estimated on 90% of the data set and tested on the remaining 10%. The operation is repeated 10 times, with a different test set each time.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article