Festive season special: Building models on seasonal data
Has your model ever failed on out of time validation because of seasonality?
If yes, then you need to know that one of the reason this happens is seasonality in performance or seasonality in cohort. This article will tell how to identify them and then take you through the industry standard techniques to take into account seasonality while building model.
Most industries have seasonal business trend. If there is seasonality in business, it implies that propensity of customer to buy a product is biased towards certain period in the year. This bias can originate because of numerous reasons. One of the most common reason for this bias is common market environment for all customers. For example, Indian financial year ends on 31st March. Because there are tax rebates on insurance products, people tend to buy more insurance product just before the end of financial year to claim these rebates. Hence, March is a high time for insurance industry in India. An analysis of past 10 years insurance business data show that 25-30% of business of insurance industry in India come in the month of March. Similarly, there is a surge in sales of consumer goods in UK and US leading up to Christmas.
Figure below shows a plot of monthly sales trend of a seasonal industry. It can noted that sales peaks and trough happen in same month year on year. In other words, the trend of sales remain same year on year.
Seasonality has negative impact over both predictive and descriptive model, if not treated explicitly. Lets take the case of a descriptive model and see what is the impact of seasonality over the model. Following is simple decision tree, where we have segmented the customer portfolio into 3 segment based on their attrition rate in next 1 month. Lets say that the month in which we are observing attrition is January (Portfolio attrition rate 30%).
Now say, we implement the model to predict the probability of attrition for the month of July. Following are the possible errors induced by seasonality :
1. Rank ordering among segments might change from 1-2-3.
2. Overall attrition rate might change from 30% and thereby changing individual attrition rate.
Both the errors result in loss of effectiveness of the model on implementing it on different months.
Predictive models are even more seriously impacted by seasonality, because the second error leads to a big deviation in the predictive power of the model. To make and accurate predictive or descriptive model on seasonal data, we need at least 12 months of data for training the model.
There are two types of seasonality which need to be addressed in any model :
1. Seasonality in Performance : This is simpler seasonality to address in any model. The example used in the beginning of the article (Insurance industry business) is a good example to illustrate this type of seasonality. Say, we want to predict the performance (business sourced) of a sales agent in next 3 months. In this case, in which business is seasonal, performance seasonality needs to be addressed to make a stable predictive model.
2. Seasonality in Cohort : This is tougher seasonality to be addressed in a model. Seasonality in cohort is driven by the difference in characteristics of the base population in different months.Whenever cohort is seasonal, performance is seasonal as well. Say, we want to predict the performance (business sourced) by a sales agent in his first 12 months. Now, we know that March business sourcing is much easier than any other month. Hence, a sales agent on-boarding in Jan, Feb and Mar will have a higher average first 3 month performance compared to any other month. A good start is highly correlated to an overall higher 12 month performance. Hence, we have a seasonality in the cohort and agents on-boarding in Jan, Feb and Mar should be treated differently.
There are three methods followed industry wide to address the two types of seasonality mentioned in last section:
1. Long interval target function : Seasonality in performance can be addressed by taking 12 month long performance window. But this method fails if the seasonality exists in the cohort.
2. Use of same training and scoring target month : This technique addresses both the seasonality issues. Say, we want to predict March attrition. We will train the model on last year march and then use the same model to predict March attrition this year. This technique is robust but fails if there was any characteristic difference between the training and scoring month. Say, the company changed the definition of attrition after March last year. In this case the technique will not take into account the recent trends of attrition and give false prediction for this March.
3. Mix of cohort : This technique is used mainly in risk modelling. It addresses both the seasonality issues. We take a mix of samples from all different types of cohort and use it as the training population. This is the most robust technique to address seasonality both in performance and cohort. This method does take into account recent trends as well while making the prediction and hence better than last technique in cases where there is some difference in characteristic between the target and the training month. But in cases where the target and training month is exactly same, last technique will give better prediction as mixing cohort will offset the target variable.
Out-of-time validation helps us identify if the model’s performance is being altered by seasonality. Techniques like bootstrapping and Jack-knife can only check the stability of the model and is incapable to check the effect of seasonality over the model.
Do you think this provides solution to any problem you face? How do you address the problem of seasonality in your modelling ? Are there any other techniques you use to improve performance of your models (prediction or stability)? Do let us know your thoughts in comments below.