Solution - Open Midterm 2

Solution - Open Midterm 2#

FINM 36700 - 2025#

UChicago Financial Mathematics#

Mark Hendricks
hendricks@uchicago.edu

Scoring#

Problem	Points
1	30
2	20
2	20
2	30

Numbered problems are worth 5pts unless specified otherwise.

Submission#

You should submit a single Jupyter notebook (.ipynb) file containing all of your code and answers to Canvas.

Note: If any other files are required to run your notebook, please include them and only them in a single .zip file.

Data#

All data files are found in at the course web-book.

https://markhendricks.github.io/finm-portfolio/.

The exam uses the data found in commodity_factors.xlsx

sheet factors
sheet returns

Both tabs contain daily returns for a set of commodity futures from January 2010 to October 2025.

approximate 252 observations per year for purposes of annualization

Factors are

LVL: A level factor of commodity data
HMS: Hard Minus Soft commodities
IMO: Input Minus Output commodities

Returns are

various commodity futures across energy, metals, livestock, and agriculture

import pandas as pd
import numpy as np
import statsmodels.api as sm

ANN_FACTOR = 252
FACTOR = "LVL"

factors = pd.read_excel(
    "./data/commodity_factors.xlsx", sheet_name="factors", index_col=0, parse_dates=True
)
returns = pd.read_excel(
    "./data/commodity_factors.xlsx", sheet_name="returns", index_col=0, parse_dates=True
)

factors

	LVL	HMS	IMO
Date
2010-01-05	0.001741	-0.003471	0.006597
2010-01-06	0.012889	0.016399	-0.020703
2010-01-07	-0.011484	0.011919	-0.001153
2010-01-08	0.000436	0.007519	0.000570
2010-01-11	-0.007377	0.006306	-0.000623
...	...	...	...
2025-10-27	-0.001392	-0.004590	0.007534
2025-10-28	-0.002979	-0.012895	0.002998
2025-10-29	0.008762	0.006220	0.006162
2025-10-30	0.019519	0.027967	-0.011074
2025-10-31	0.001617	-0.005249	-0.006371

3972 rows × 3 columns

returns

	CL	GC	HO	LE	NG	PL	RB	SB	SI	ZC	ZL	ZM	ZS
Date
2010-01-05	0.003190	0.000358	0.001643	0.011127	-0.041978	0.008897	0.009789	0.000724	0.019553	0.000597	-0.004646	0.010759	0.002620
2010-01-06	0.017244	0.015920	0.004148	-0.004344	0.065993	0.013980	0.005459	0.027858	0.021484	0.007164	-0.000983	-0.004696	-0.001663
2010-01-07	-0.006251	-0.002465	-0.008896	-0.000291	-0.033783	0.000515	-0.000796	-0.014432	0.009360	-0.010077	-0.016720	-0.034287	-0.031176
2010-01-08	0.001089	0.004501	0.007648	-0.001164	-0.009817	0.007469	0.009555	-0.016786	0.006818	0.013174	-0.011503	-0.000652	-0.004667
2010-01-11	-0.002779	0.010982	-0.009181	-0.009030	-0.051313	0.015148	-0.005846	-0.028333	0.012190	-0.001182	-0.008601	-0.006845	-0.011106
...	...	...	...	...	...	...	...	...	...	...	...	...	...
2025-10-27	-0.003089	-0.028288	0.013732	-0.021070	0.041768	-0.009725	-0.001196	-0.034068	-0.037518	0.012995	0.009946	0.013941	0.024478
2025-10-28	-0.018920	-0.008921	-0.020073	-0.005790	-0.028181	-0.000887	0.002499	-0.006224	0.012091	0.007580	-0.010045	0.027834	0.010307
2025-10-29	0.005486	0.004412	0.015541	0.017143	0.009268	0.009068	0.025192	0.003479	0.012647	0.004630	-0.001990	0.007178	0.001855
2025-10-30	0.001488	0.004418	0.014726	0.016746	0.171801	0.010683	0.015048	-0.009709	0.014815	-0.008641	-0.010167	0.022352	0.010183
2025-10-31	0.006769	-0.004773	-0.011707	0.005632	0.042467	-0.023938	-0.005141	0.010504	-0.008962	0.002905	-0.019537	0.019011	0.007789

3972 rows × 13 columns

1. Commodity Returns and Factors#

1.1 Factor Summary Statistics#

For each of the three factors, report only the following summary statistics (rounded to at least 6 decimal places):

Annualized Mean Return
Annualized Volatility
Annualized Sharpe Ratio

sol_11 = pd.concat(
    [
        factors.mean() * ANN_FACTOR,
        factors.std() * np.sqrt(ANN_FACTOR),
        (factors.mean() / factors.std()) * np.sqrt(ANN_FACTOR),
    ],
    axis=1,
    keys=["Annualized Return", "Annualized Volatility", "Annualized Sharpe Ratio"],
)

sol_11

	Annualized Return	Annualized Volatility	Annualized Sharpe Ratio
LVL	0.064439	0.154613	0.416775
HMS	0.008842	0.226259	0.039078
IMO	0.007281	0.153281	0.047500

1.2 Factor Correlations#

Calculate and report the correlation matrix between the three factors (rounded to at least 6 decimal places).

sol_12 = factors.corr()
sol_12

	LVL	HMS	IMO
LVL	1.000000	0.400202	0.010439
HMS	0.400202	1.000000	-0.071747
IMO	0.010439	-0.071747	1.000000

1.3 Interpretation#

Does the factor construction make sense given the correlations you observe?

Overall, yes. The correlations across the factors are low, IMO in particular is very uncorrelated to the other factors. The only worrying thing is that the LVL and HMS factors are ~40% correlated.

1.4 Tangency Portfolio Weights#

Build a tangency portfolio using the three factors as assets.

Report the weights of each factor in the tangency portfolio, rounded to at least 6 decimal places. You may assume that the factors use excess return.

def tangency_weights(returns, scale_cov=1):
    covmat_full = returns.cov()
    covmat_diag = np.diag(np.diag(covmat_full))
    covmat = scale_cov * covmat_full + (1 - scale_cov) * covmat_diag

    weights = np.linalg.solve(covmat, returns.mean())
    weights = weights / weights.sum()

    return pd.DataFrame(weights, index=returns.columns, columns=["Weight"])


sol_14 = tangency_weights(factors)
sol_14

	Weight
LVL	1.171917
HMS	-0.250927
IMO	0.079011

1.5 Interpretation#

What do the tangency portfolio weights suggest about the relative importance of each factor?

The tangency portfolio implies that the most important factor by far is LVL, followed by HMS. IMO is less important but also doesn’t have a 0 weight, indicating is contributes (albeit more marginally than the other two).

1.6.#

Estimate an autoregression of the factor LVL…

\[r_t = \gamma + \rho\, r_{t-1} + \epsilon_t\]

Only report $\rho$ (rounded to at least 6 decimal places).

Does the LVL factor exhibit momentum?

lvl = factors["LVL"]
lvl_lag = lvl.shift(1)

sol_16 = sm.OLS(lvl, sm.add_constant(lvl_lag), missing="drop").fit().params["LVL"]
print(f"Rho: {sol_16:.6f}")

Rho: 0.017041

$\rho$ is 0.017, which is positive, indicating that the LVL factor exhibits momentum.

2. Single Factor Model#

2.1 LVL Factor#

We want to test the hypothesis that:

\[ \mathbb{E}[r_{i}] = \beta_{i,LVL} \cdot \mathbb{E}[r_{LVL}] \]

Regress each commodity’s returns against the LVL factor, and report the mean absolute alpha, $\bar{\alpha}$ and $r^2$ across all commodities.

Annualize the alpha.

Output exactly the following two numbers rounded to 6 decimal places:

Mean Absolute Annualized Alpha across all commodities
Mean R-squared across all commodities

sol_21_dict = {"Annualized Alpha": [], "R-Squared": [], "Beta": []}

for asset in returns.columns:
    model = sm.OLS(returns[asset], sm.add_constant(factors["LVL"])).fit()
    sol_21_dict["Annualized Alpha"].append(model.params["const"] * ANN_FACTOR)
    sol_21_dict["R-Squared"].append(model.rsquared)
    sol_21_dict["Beta"].append(model.params["LVL"])

sol_21 = pd.DataFrame(sol_21_dict, index=returns.columns)
sol_21[["Annualized Alpha", "R-Squared"]].abs().mean().to_frame("Summary")

	Summary
Annualized Alpha	0.034876
R-Squared	0.259890

2.2. Interpretation#

If our hypothesis were true, what would you expect the values of $\bar{\alpha}$ and $\bar{r^2}$ to be?

If this model were true, then $\bar{\alpha}$ should be zero (specifically, all alphas should be zero), and we don’t care about the $r^2$.

2.3 Cross-Sectional Test#

Let’s test the one-factor LVL model directly. From 2.1, we already have what we need:

The dependent variable, (y): mean excess returns from each of the commodities.
The regressor, (x): the market beta for each commodity from the time-series regressions.

Then we can estimate the following equation:

\[ \underbrace{\mathbb{E}\left[\tilde{r}^{i}\right]}_{n\times 1\text{ data}} = \textcolor{ForestGreen}{\underbrace{\eta}_{\text{regression intercept}}} + \underbrace{{\beta}^{i,\text{LVL}};}_{n\times 1\text{ data}}~ \textcolor{ForestGreen}{\underbrace{\lambda_{\text{LVL}}}_{\text{regression estimate}}} + \textcolor{ForestGreen}{\underbrace{\upsilon}_{n\times 1\text{ residuals}}} \]

Report exactly the following 3 numbers (rounded to at least 6 decimal places):

The R-squared of this regression
The intercept estimate, $\hat{\eta}$
The regression coefficient $\lambda_{LVL}$

means = returns.mean()

model_23 = sm.OLS(means, sm.add_constant(sol_21["Beta"])).fit()

sol_23 = pd.DataFrame(
    {
        "R-Squared": model_23.rsquared,
        "Eta": model_23.params["const"],
        "Lambda_LVL": model_23.params["Beta"],
    },
    index=["Value"],
)
sol_23.T

	Value
R-Squared	0.093357
Eta	0.000163
Lambda_LVL	0.000092

2.4.#

Does your time-series or cross-sectional estimate give a higher premium to the LVL factor?

Note that the cross-sectional premium is the $\lambda_{\text{LVL}}$, and the time-series estimate is just the mean return. The annualized mean return of LVL is 0.064439, whereas the annualized LVL lambda is 0.023302. As such, we conclude that the cross-sectional premium is smaller than the time-series premium.

3. Trading the Model#

3.1 Beta Estimation#

For each commodity, report their LVL beta rounded to at least 6 decimal places.

Display a table of these betas, sorted from lowest to highest.

Remember#

We estimated the betas in the time-series regression in 2.1.

Hint#

Use df.sort_values(by=<YOUR_DF>, ascending=True) to sort your results.

sol_21.sort_values(by="Beta", ascending=True)[["Beta"]]

	Beta
LE	0.243876
GC	0.441604
ZM	0.698967
SB	0.742221
ZS	0.770218
ZC	0.823626
PL	0.834771
ZL	0.850391
SI	1.038445
NG	1.407854
HO	1.533688
RB	1.695943
CL	1.918395

3.2 Portfolio Formation#

Regardless of your answer to 3.1, allocate your portfolio as follows:

Go long GC, LE, and ZM
Go short CL, HO, and RB

Go long 1 and short 0.25.

That is, your portfolio should be: $$ r_{port} = 1 \cdot \left( \frac{1}{3} \cdot r_{GC} + \frac{1}{3} \cdot r_{LE} + \frac{1}{3} \cdot r_{ZM} \right) - 0.25 \cdot \left(\frac{1}{3} \cdot r_{CL} + \frac{1}{3} \cdot r_{HO} + \frac{1}{3} \cdot r_{RB}\right) $$ Report the last 5 daily returns of your betting against beta portfolio, rounded to at least 6 decimal places.

sol_32 = (
    1 * (returns[["GC", "LE", "ZM"]].mean(axis=1))
    - 0.25 * (returns[["CL", "HO", "RB"]].mean(axis=1))
).to_frame("BAB Portfolio")
sol_32.tail(5)

	BAB Portfolio
Date
2025-10-27	-0.012593
2025-10-28	0.007415
2025-10-29	0.005726
2025-10-30	0.011900
2025-10-31	0.007463

3.3 Performance Evaluation#

For your portfolio, report the following performance statistics (rounded to at least 6 decimal places):

Annualized Return
Annualized Volatility
Annualized Sharpe Ratio

sol_33 = pd.concat(
    [
        sol_32.mean() * ANN_FACTOR,
        sol_32.std() * np.sqrt(ANN_FACTOR),
        np.sqrt(ANN_FACTOR) * sol_32.mean() / sol_32.std(),
    ],
    axis=1,
)
sol_33.columns = ["Annualized Mean", "Annualized Volatility", "Annualized Sharpe"]

sol_33.T

	BAB Portfolio
Annualized Mean	0.053383
Annualized Volatility	0.144502
Annualized Sharpe	0.369425

3.4#

For your portfolio, test the hypothesis that its premium can be explained by the LVL factor alone:

\[ \mathbb{E}[r_{port}] = \beta_{port,LVL} \cdot \mathbb{E}[r_{LVL}] \]

Report exactly the following 3 numbers (rounded to at least 6 decimal places):

Annualized Alpha
LVL Beta
R-squared for the regression

model_34 = sm.OLS(sol_32, sm.add_constant(lvl), missing="drop").fit()

sol_34 = pd.DataFrame(
    {
        "Annualized Alpha": model_34.params["const"] * ANN_FACTOR,
        "LVL Beta": model_34.params["LVL"],
        "R-Squared": model_34.rsquared,
    },
    index=["Summary"],
)
sol_34.T

	Summary
Annualized Alpha	0.051290
LVL Beta	0.032480
R-Squared	0.001208

4. Multi-Factor Model#

We now want to test a multi-factor model using LVL and HMS and IMO as factors:

\[ \mathbb{E}[r_{i}] = \beta_{i,WTI} \cdot \mathbb{E}[r_{LVL}] + \beta_{i,HMS} \cdot \mathbb{E}[r_{HMS}] + \beta_{i,IMO} \cdot \mathbb{E}[r_{IMO}] \]

4.1 Time Series Test#

Estimate the time series test of this pricing model. Regress each commodity’s returns against the three factors, and report the following for each commodity (rounded to at least 6 decimal places):

Annualized Alpha
LVL, HMS, and IMO Betas
R-squared

sol_41_dict = {
    "Annualized Alpha": [],
    "LVL Beta": [],
    "HMS Beta": [],
    "IMO Beta": [],
    "R-Squared": [],
}

for asset in returns.columns:
    model = sm.OLS(returns[asset], sm.add_constant(factors), missing="drop").fit()
    sol_41_dict["Annualized Alpha"].append(model.params["const"] * ANN_FACTOR)
    sol_41_dict["LVL Beta"].append(model.params["LVL"])
    sol_41_dict["HMS Beta"].append(model.params["HMS"])
    sol_41_dict["IMO Beta"].append(model.params["IMO"])
    sol_41_dict["R-Squared"].append(model.rsquared)

sol_41 = pd.DataFrame(sol_41_dict, index=returns.columns)
sol_41

	Annualized Alpha	LVL Beta	HMS Beta	IMO Beta	R-Squared
CL	-0.035599	1.527030	0.656709	0.653682	0.637664
GC	0.071595	0.373286	0.123614	-0.393933	0.348973
HO	-0.023058	1.205122	0.543109	1.014183	0.759600
LE	0.059421	0.328136	-0.148049	0.236226	0.119500
NG	0.080357	1.035345	0.662592	-1.501411	0.373050
PL	-0.009842	0.711887	0.217593	-0.439602	0.364752
RB	-0.015995	1.243184	0.749484	1.335847	0.786582
SB	-0.053905	1.065017	-0.542152	-0.510662	0.321713
SI	0.056488	0.901595	0.246072	-0.701687	0.434255
ZC	-0.031637	1.246405	-0.717257	-0.262703	0.514722
ZL	-0.026659	1.055894	-0.356494	0.316702	0.436060
ZM	-0.028435	1.159886	-0.789761	0.154944	0.511819
ZS	-0.042733	1.147212	-0.645459	0.098414	0.707169

4.2#

Report the annualized Sharpe ratio (rounded to at least 6 decimal places) of the tangency portfolio formed from

the individual commodities.
the three factors

What should be true of the Sharpe ratios if the factor pricing model is accurate?

wttan_commodities = tangency_weights(returns=returns)

rets_commodities = returns.values @ wttan_commodities.values
sharpe_commodities = rets_commodities.mean() / rets_commodities.std() * np.sqrt(252)

wttan_factors = tangency_weights(returns=factors)
rets_factors = factors.values @ wttan_factors.values
sharpe_factors = rets_factors.mean() / rets_factors.std() * np.sqrt(252)

sol_42 = pd.DataFrame(
    {"Commodities": sharpe_commodities, "Factors": sharpe_factors},
    index=["Annualized Sharpe Ratio"],
)
sol_42

	Commodities	Factors
Annualized Sharpe Ratio	0.852347	0.440656

If the pricing model were accurate, then the factors would have the highest Sharpe Ratio, since if it were true there would be no risk premia to be earned outside of the factors (ie. you have no excess return and only incur variance). This tells us that the pricing model is not accurate.

4.3 Cross-Sectional Test#

Run the cross-sectional test of this multi-factor model:

\[ \mathbb{E}\left[\tilde{r}^{i}\right] = \lambda_{0} + \lambda_{LVL} \cdot \beta_{i,LVL} + \lambda_{HMS} \cdot \beta_{i,HMS} + \lambda_{IMO} \cdot \beta_{i,IMO} + \nu_{i} \]

Report exactly the following numbers (rounded to at least 6 decimal places):

$\lambda_{0}$
$\lambda_{LVL}$
$\lambda_{HMS}$
$\lambda_{IMO}$
$r^2$ of the cross-sectional regression
MAE of the cross-sectional regression

Annualize the MAE.

sol_43_model = sm.OLS(
    returns.mean(),
    sm.add_constant(sol_41[["LVL Beta", "HMS Beta", "IMO Beta"]]),
    missing="drop",
).fit()

sol_43 = pd.DataFrame(
    {
        "Lambda_0": sol_43_model.params["const"],
        "Lambda_LVL": sol_43_model.params["LVL Beta"],
        "Lambda_HMS": sol_43_model.params["HMS Beta"],
        "Lambda_IMO": sol_43_model.params["IMO Beta"],
        "R-squared": sol_43_model.rsquared,
        "MAE of residuals": np.mean(np.abs(sol_43_model.resid)),
    },
    index=["Summary"],
)
sol_43.T

	Summary
Lambda_0	0.000321
Lambda_LVL	-0.000065
Lambda_HMS	0.000200
Lambda_IMO	-0.000067
R-squared	0.646543
MAE of residuals	0.000062

4.4 Interpretation#

Do the results of the cross-sectional test support the multi-factor model?

No! The $r^2$ is only 0.65. If the multi-factor model were accurate, this value should be 1.

4.5 Risk Premia#

Compare the risk premia ($\lambda$’s) from the cross-sectional test to the average returns of each factor. Report exactly the following table (rounded to at least 6 decimal places):

Average return of each factor
Risk premia from the cross-sectional test

Annualize the estimates.

sol_45_premia = sol_43.loc["Summary", ["Lambda_LVL", "Lambda_HMS", "Lambda_IMO"]].T
sol_45_avg = factors.mean()


sol_45 = pd.concat(
    [sol_45_avg.reset_index(drop=True), sol_45_premia.reset_index(drop=True)], axis=1
)

sol_45.index = factors.columns
sol_45.columns = ["Average Returns", "Risk Premia"]
sol_45 = sol_45 * ANN_FACTOR
sol_45

	Average Returns	Risk Premia
LVL	0.064439	-0.016413
HMS	0.008842	0.050369
IMO	0.007281	-0.016984

4.6 Interpretation#

What do you observe from the comparison of average returns and risk premia?

Theoretically, what could cause the risk premium of a factor to deviate from its average return?

For LVL and IMO, the average returns are of opposite signs of their risk premia. Both have positive average returns (6.4% and 0.7%, respectively), but their risk premia are negative (-1.6% and -1.7%). For HMS, the risk premia is higher than the average return, the return is 0.8%, but the risk premia is 5%.
Factor returns are not always compensation for their own unique risk!