When should I standardize my segmentation data?

Created by Steve Hoover, Modified on Sun, Jan 14, 2024 at 11:08 AM by Steve Hoover

Segmentation will compute a “distance” (a similarity index) between the different customers you are trying to segment, and will cluster most similar customers first.

That distance is computed on the differences observed between the segmentation variables. If these variables are measured on widely different scales, however, distances will be distorted. For instance, if X1 is measured on a scale from 0 to 1, and X2 is measured on a scale from 0 to 100, then X2 will have 100 times more impact on these distance computations than X1 will.

In other words, customers will mostly be segmented based on the similarities observed along X2, whereas similarities along X1 will be mostly neglected.

In short, when segmentation variables do not use the same scales, you need to standardize your data before segmenting it, otherwise, results will be biased, and some segmentation variables (those measured on the widest scales) will have an undue influence on the results.

When all your segmentation variables use the same scale, however, it is preferable NOT to standardize anything. The reason goes as follows:

Suppose two of your questions are on a 1-10 scale and ask respondents whether they use:
(1) emails and
(2) Instagram.

Most of your respondents will respond 9 or 10 to the first question but may answer anything between 1 and 10 to the second.

Because Enginius does not know the scales originally used, it will scale the answers so both questions fall on similar ranges, and will artificially inflate the small differences found in the first question, while reducing the impact of the large differences found in question 2. It might not impact the solution much, but why take the risk?

In short… Never standardize your data when your segmentation variables use the same scales; always do it when they use different scales, or are measured on different metrics (e.g., if you compare dollars, months, and the number of purchases).