Exercise - Forecasting with Linear Factor Pricing Models

Exercise - Forecasting with Linear Factor Pricing Models#

Thanks to Tobias Rodriguez del Pozo

Let’s use the “AQR” model in (3), for forecasting excess returns. We will do this at each point in time to build a point-in-time series of forecasts. We will then see how well they perform.

(3)#\[\mathbb{E}[\tilde{r}^i] = \beta^{i,\mathrm{MKT}} \, \mathbb{E}[\tilde{f}^{\mathrm{MKT}}] + \beta^{i,\mathrm{HML}} \, \mathbb{E}[\tilde{f}^{\mathrm{HML}}] + \beta^{i,\mathrm{RMW}} \, \mathbb{E}[\tilde{f}^{\mathrm{RMW}}] + \beta^{i,\mathrm{UMD}} \, \mathbb{E}[\tilde{f}^{\mathrm{UMD}}]\]

(4)#\[\tilde{r}^i_t = \alpha^i + \beta^{i,\mathrm{MKT}} \tilde{f}^{\mathrm{MKT}}_t + \beta^{i,\mathrm{HML}} \tilde{f}^{\mathrm{HML}}_t + \beta^{i,\mathrm{RMW}} \tilde{f}^{\mathrm{RMW}}_t + \beta^{i,\mathrm{UMD}} \tilde{f}^{\mathrm{UMD}}_t + \varepsilon_t\]

The model does not give us any info about forecasting the factors themselves. Accordingly, calculate the “expanding” mean of the four factors. We will use these as our point-in-time factor premia.
For each of the nsecurities, estimate (4) over a window of 60 months. Make sure to estimate these rolling regressions WITH an intercept But we only need to save the beta estimates.
For every security, \(i\), and at every month, \(t\) (after the first 60), calculate (3) using the point-in-time factor premia and betas calculated in the prior two steps. This is your forecast made at the end of period \(t\), for \(r^i_{t+1}\). You are using end-of-time \(t\) info in the estimation, so it is a forecast for \(t+ 1\). In order to better align it with our data, shift it ahead a time period. So the dataframe of forecasts has been pushed one month later. (The Feb value is now a March value.) Thus, your forecast timestamp now refers to the time being forecasted, rather than the time it was made.
This gives you a series of forecasts \(\widehat{\tilde{r}^i_{t}}\).
In order to decide if these forecasts are good, we need a comparison. Use the point-in-time sample average estimates of \(\tilde{r}_t\). Calculate the expanded mean, and once again, be sure to shift them one period into the future so that the time stamps refer to the period being forecast. This gives us the benchmark forecast: \(\bar{\tilde{r}}_t\)
Compare our Linear Factor Pricing forecasts with the naive forecasts using Out-of-Sample (OOS) R-squared.

\[\text{OOS R-squared} = 1 - \frac{\text{MSE}_{\text{forecast}}}{\text{MSE}_{\text{baseline}}}\]

where MSE stands for Mean Squared Error.

Warning!#

This calculation will be wrong if your forecasts have NaN values where the benchmark does not. For this reason, it is important to eliminate any date where either series has an NaN value. If you are careful about this issue, then you can write the OOS r-squared as a ratio of SSE.

Data#

Use the data found in data/factor_pricing_data_weekly.xlsx.

Factors: Monthly excess return data for the overall equity market, \(\tilde{r}^{\text{MKT}}\).

The column header to the market factor is MKT rather than MKT-RF, but it is indeed already in excess return form.
The sheet also contains data on five additional factors.
All factor data is already provided as excess returns

1.#

Report the OOS r-squared for each of the n security forecasts.

2.#

Does the LPM do a good job of forecasting monthly returns? For which asset does it perform best? And worst?

3.#

Re-do the exercise using a window of 36 months. And 96 months. Do either of these windows work better?

4.#

Re-do the exercise using the FF 5-Factor Model instead of the AQR model. Re-do it with the CAPM. Do either of these models improve on forecasting?

Hints#

You may find the following pandas command helpful: .expanding().mean()
You may wish to use from statsmodels.regression.rolling import RollingOLS
This will take longer to compute: we are estimating a multifactor regression at every month in time and for every security. So we are running roughly T × N regressions.
See .shift() in pandas.
For instance, if you use the rolling regressions, your initial forecast values will be NaN. But your expanded mean calculation for the baseline will not have any NaN. Thus, it is important to require a minimum number of observations in the expanded mean. Or you can more explicitly enforce that both dataframes have NaN in the same time periods.

Exercise - Forecasting with Linear Factor Pricing Models

Contents

Exercise - Forecasting with Linear Factor Pricing Models#

Warning!#

Data#

1.#

2.#

3.#

4.#

Hints#