Exercise - Forecasting with Linear Factor Pricing Models#

In this exercise, we use linear factor pricing models (LFPMs) to forecast excess returns. We will:

  1. Build point-in-time forecasts using rolling betas and factor premia estimates.

  2. Evaluate forecast quality via OOS R-squared and Information Coefficient (IC).

  3. Compare results across different factor models, lambda methods, and beta windows.

  4. Construct a long-short trading strategy based on the forecasts.

LFPM Forecast#

The expected excess return of security \(i\) at time \(t\) is:

\[ \mathbb{E}_t[\tilde{r}^i_{t+1}] = \boldsymbol{\beta}^{i\prime}_{t} \, \boldsymbol{\lambda}_t \]

where:

  • \(\boldsymbol{\beta}^{i}_{t}\) is a vector of rolling betas for security \(i\), estimated using the last WINDOW observations.

  • \(\boldsymbol{\lambda}_t\) is the vector of factor premia estimated using data available at time \(t\).

The linear factor decomposition used for beta estimation is:

\[ \tilde{r}^i_t = \alpha^i + \boldsymbol{\beta}^{i\prime} \tilde{\mathbf{f}}_t + \varepsilon^i_t \]

Estimate this regression with an intercept, but only use the beta estimates for forecasting.

We compare LFPM forecasts against a baseline: the expanding mean of each security’s own return.

Data#

  • Factors: Monthly excess returns for MKT, SMB, HML, RMW, CMA, UMD from data/factor_pricing_data_monthly.xlsx.

  • Test assets: The 25 Fama-French size/value portfolios (5×5 sorts on ME and BE/ME), pulled via pandas_datareader.

import pandas_datareader.data as web
ds = web.DataReader('25_Portfolios_5x5', 'famafrench', start='1980-01')
rets_total = ds[0]  # Table 0: value-weighted returns

Note that these are total returns in percent. Convert to decimal and subtract the risk-free rate to get excess returns.

Factor Models#

Consider three factor models:

Model

Factors

CAPM

MKT

AQR 4-Factor

MKT, HML, RMW, UMD

FF 5-Factor

MKT, SMB, HML, RMW, CMA

Factor Premia (\(\lambda\)) Estimation#

The LFPM does not tell us how to forecast the factors themselves. Consider two approaches:

  • Expanding mean: use all historical factor returns up to time \(t\).

  • Constant: use fixed annualized premia (converted to monthly):

Factor

Annualized \(\lambda\)

MKT

8%

SMB

4%

HML

4%

RMW

4%

CMA

4%

UMD

4%

1. Build LFPM Forecasts#

Using the AQR 4-Factor model with expanding-mean factor premia and a rolling window of 60 months (5 years):

  • Calculate the expanding mean of the four factors. These are your point-in-time factor premia, \(\boldsymbol{\lambda}_t\). Make sure to shift one period so the estimate at time \(t\) uses only data through \(t-1\).

  • For each of the \(n\) securities, estimate the linear factor decomposition over a rolling window of 60 months. Estimate with an intercept, but only save the beta estimates.

  • For every security \(i\) and at every month \(t\) (after the first 60), calculate \(\hat{r}^i_{t+1} = \boldsymbol{\beta}^{i\prime}_t \boldsymbol{\lambda}_t\). Shift the forecasts one period so the timestamps refer to the period being forecasted.

  • Build the baseline forecast: the expanding mean of each security’s own excess return, shifted identically.

Warning!#

The OOS R-squared calculation will be wrong if your forecasts have NaN values where the baseline does not. Ensure both DataFrames have NaN in the same time periods, either by requiring a minimum number of observations in the expanding mean, or by explicitly dropping dates where either series is NaN.

2. Evaluate Forecasts#

2.1.#

Report the OOS R-squared for each of the \(n\) security forecasts:

\[ \text{OOS R-squared} = 1 - \frac{\text{SSE}_{\text{forecast}}}{\text{SSE}_{\text{baseline}}} \]

Also compute the Information Coefficient (IC) for each security: the correlation between the forecast and realized return.

Report summary statistics (mean, median, min, max, % positive) across all securities.

2.2.#

Does the LFPM do a good job of forecasting monthly returns? For which portfolio does it perform best? And worst?

3. Compare Models and Choices#

Refactor the forecasting and evaluation pipeline into a reusable function so you can easily compare across different configurations.

3.1.#

Build a comparison table with rows for each combination of factor model (CAPM, AQR, FF5) and lambda method (expanding, constant). Report the Mean OOS R-squared and Mean IC for each.

3.2.#

Which combination works best? Does the choice of lambda method matter more than the choice of factor model? Note that constant lambda uses values informed by the full historical sample, so it has look-ahead bias.

3.3.#

Using constant lambda (to isolate the effect of the beta window), build a table comparing forecast quality across beta estimation windows (36, 48, 60, 120, 180, 240 months) for CAPM and AQR 4-Factor. Does the beta window matter much?

4. Long-Short Strategy#

Even if individual forecasts are noisy, the cross-sectional ranking may contain useful information.

Each period:

  1. Rank securities by LFPM forecast.

  2. Go long the top 20% with equal weight.

  3. Go short the bottom 20% with equal weight.

4.1.#

Using the AQR 4-Factor model with expanding lambda and a 60-month window:

  • Report annualized Mean, Vol, and Sharpe of the long-short strategy, compared to MKT.

  • Report alpha, beta, r-squared, and Info Ratio vs MKT.

  • Plot cumulative returns for both the strategy and MKT.

4.2.#

Compare the long-short strategy across all three factor models (CAPM, AQR, FF5). Report:

  • Forecast evaluation (Mean OOS R-squared and Mean IC)

  • Strategy performance (Mean, Vol, Sharpe)

  • MKT attribution (alpha, beta, r-squared, Info Ratio)

  • Cumulative returns plot

Hints#

  • Use pandas_datareader.data.DataReader('25_Portfolios_5x5', 'famafrench', start='1980-01') to pull the FF 25 portfolios. Table 0 has value-weighted returns in percent.

  • You may find .expanding().mean() in pandas helpful for factor premia estimation.

  • Use from statsmodels.regression.rolling import RollingOLS for rolling regressions.

  • Use .shift(1) to ensure point-in-time alignment.

  • Annualize monthly statistics with a factor of 12.