The success of machine learning pipelines depends on feature engineering as their essential foundation. The two strongest methods for handling time series data are lag features and rolling features, according to your advanced techniques. The ability to use these techniques will enhance your model performance for sales forecasting, stock price prediction, and demand planning tasks.
This guide explains lag and rolling features by showing their importance and providing Python implementation methods and potential implementation challenges through working code examples.
Time series feature engineering creates new input variables through the process of transforming raw temporal data into features that enable machine learning models to detect temporal patterns more effectively. Time series data differs from static datasets because it maintains a sequential structure, which requires observers to understand that past observations impact what will come next.

The conventional machine learning models XGBoost, LightGBM, and Random Forests lack built-in capabilities to process time. The system requires specific indicators that need to show past events that occurred before. The implementation of lag features together with rolling features serves this purpose.
A lag feature is simply a past value of a variable that has been shifted forward in time until it matches the current data point. The sales prediction for today depends on three different sales information sources, which include yesterday’s sales data and both seven-day and thirty-day sales data.

import pandas as pd
import numpy as np
# Create a sample time series dataset
np.random.seed(42)
dates = pd.date_range(start='2024-01-01', periods=15, freq='D')
sales = [200, 215, 198, 230, 245, 210, 225, 260, 275, 240, 255, 290, 305, 270, 285]
df = pd.DataFrame({'date': dates, 'sales': sales})
df.set_index('date', inplace=True)
# Create lag features
df['lag_1'] = df['sales'].shift(1)
df['lag_3'] = df['sales'].shift(3)
df['lag_7'] = df['sales'].shift(7)
print(df.head(12))
Output:

The initial appearance of NaN values demonstrates a form of data loss that occurs because of lagging. This factor becomes crucial for determining the number of lags to be created.
The selection process for optimal lags demands scientific methods that eliminate random selection as an option. The following methods have shown successful results in practice:
The rolling features function as window features that operate by moving through time to calculate variable quantities. The system provides you with aggregated statistics, which include mean, median, standard deviation, minimum, and maximum values for the last N periods instead of showing you a single past value.

The following features provide excellent capabilities to perform their designated tasks:
The following aggregations establish their presence as standard practice in rolling windows:
import pandas as pd
import numpy as np
np.random.seed(42)
dates = pd.date_range(start='2024-01-01', periods=15, freq='D')
sales = [200, 215, 198, 230, 245, 210, 225, 260, 275, 240, 255, 290, 305, 270, 285]
df = pd.DataFrame({'date': dates, 'sales': sales})
df.set_index('date', inplace=True)
# Rolling features with window size of 3 and 7
df['roll_mean_3'] = df['sales'].shift(1).rolling(window=3).mean()
df['roll_std_3'] = df['sales'].shift(1).rolling(window=3).std()
df['roll_max_3'] = df['sales'].shift(1).rolling(window=3).max()
df['roll_mean_7'] = df['sales'].shift(1).rolling(window=7).mean()
print(df.round(2))
Output:

The .shift(1) function must be executed before the .rolling() function because it creates a vital connection between both functions. The system needs this mechanism because it will create rolling calculations that depend exclusively on historical data without using any current data.
In actual machine learning time series workflows, researchers create their own hybrid feature set, which includes both lag features and rolling features. We provide you with a complete feature engineering function, which you can use for any project.
import pandas as pd
import numpy as np
def create_time_features(df, target_col, lags=[1, 3, 7], windows=[3, 7]):
"""
Create lag and rolling features for time series ML.
Parameters:
df : DataFrame with datetime index
target_col : Name of the target column
lags : List of lag periods
windows : List of rolling window sizes
Returns:
DataFrame with new features
"""
df = df.copy()
# Lag features
for lag in lags:
df[f'lag_{lag}'] = df[target_col].shift(lag)
# Rolling features (shift by 1 to avoid leakage)
for window in windows:
shifted = df[target_col].shift(1)
df[f'roll_mean_{window}'] = shifted.rolling(window).mean()
df[f'roll_std_{window}'] = shifted.rolling(window).std()
df[f'roll_max_{window}'] = shifted.rolling(window).max()
df[f'roll_min_{window}'] = shifted.rolling(window).min()
return df.dropna() # Drop rows with NaN from lag/rolling
# Sample usage
np.random.seed(0)
dates = pd.date_range('2024-01-01', periods=60, freq='D')
sales = 200 + np.cumsum(np.random.randn(60) * 5)
df = pd.DataFrame({'sales': sales}, index=dates)
df_features = create_time_features(df, 'sales', lags=[1, 3, 7], windows=[3, 7])
print(f"Original shape: {df.shape}")
print(f"Engineered shape: {df_features.shape}")
print(f"\nFeature columns:\n{list(df_features.columns)}")
print(f"\nFirst few rows:\n{df_features.head(3).round(2)}")
Output:

The most severe error in time series feature engineering occurs when data leakage, which reveals upcoming data to testing features, leads to misleading model performance.
Key mistakes to watch out for:
| Use Case | Recommended Features |
|---|---|
| Strong autocorrelation in data | Lag features (lag-1, lag-7) |
| Noisy signal, need smoothing | Rolling mean |
| Seasonal patterns (weekly) | Lag-7, lag-14, lag-28 |
| Trend detection | Rolling mean over long windows |
| Anomaly detection | Deviation from rolling mean |
| Capturing variability / risk | Rolling standard deviation, rolling range |
The time series machine learning infrastructure uses lag features and rolling features as its essential components. The two methods establish a pathway from unprocessed sequential data to the organized data format that machine learning models require for their training process. The methods become the highest impact factor for forecasting accuracy when users execute them with precise data handling and window selection methods, and their contextual understanding of the specific field.
The best part? They provide clear explanations that require minimal computing resources and function with any machine learning model. These features will benefit you regardless of whether you use XGBoost for demand forecasting, LSTM for anomaly detection, or linear regression for baseline models.