Solution - Open Midterm 2#
FINM 36700 - 2025#
UChicago Financial Mathematics#
Mark Hendricks
Scoring#
Problem |
Points |
|---|---|
1 |
30 |
2 |
20 |
2 |
20 |
2 |
30 |
Numbered problems are worth 5pts unless specified otherwise.
Submission#
You should submit a single Jupyter notebook (.ipynb) file containing all of your code and answers to Canvas.
Note: If any other files are required to run your notebook, please include them and only them in a single .zip file.
Data#
All data files are found in at the course web-book.
https://markhendricks.github.io/finm-portfolio/.
The exam uses the data found in commodity_factors.xlsx
sheet
factorssheet
returns
Both tabs contain daily returns for a set of commodity futures from January 2010 to October 2025.
approximate 252 observations per year for purposes of annualization
Factors are
LVL: A level factor of commodity data
HMS: Hard Minus Soft commodities
IMO: Input Minus Output commodities
Returns are
various commodity futures across energy, metals, livestock, and agriculture
import pandas as pd
import numpy as np
import statsmodels.api as sm
ANN_FACTOR = 252
FACTOR = "LVL"
factors = pd.read_excel(
"./data/commodity_factors.xlsx", sheet_name="factors", index_col=0, parse_dates=True
)
returns = pd.read_excel(
"./data/commodity_factors.xlsx", sheet_name="returns", index_col=0, parse_dates=True
)
factors
| LVL | HMS | IMO | |
|---|---|---|---|
| Date | |||
| 2010-01-05 | 0.001741 | -0.003471 | 0.006597 |
| 2010-01-06 | 0.012889 | 0.016399 | -0.020703 |
| 2010-01-07 | -0.011484 | 0.011919 | -0.001153 |
| 2010-01-08 | 0.000436 | 0.007519 | 0.000570 |
| 2010-01-11 | -0.007377 | 0.006306 | -0.000623 |
| ... | ... | ... | ... |
| 2025-10-27 | -0.001392 | -0.004590 | 0.007534 |
| 2025-10-28 | -0.002979 | -0.012895 | 0.002998 |
| 2025-10-29 | 0.008762 | 0.006220 | 0.006162 |
| 2025-10-30 | 0.019519 | 0.027967 | -0.011074 |
| 2025-10-31 | 0.001617 | -0.005249 | -0.006371 |
3972 rows × 3 columns
returns
| CL | GC | HO | LE | NG | PL | RB | SB | SI | ZC | ZL | ZM | ZS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||||||
| 2010-01-05 | 0.003190 | 0.000358 | 0.001643 | 0.011127 | -0.041978 | 0.008897 | 0.009789 | 0.000724 | 0.019553 | 0.000597 | -0.004646 | 0.010759 | 0.002620 |
| 2010-01-06 | 0.017244 | 0.015920 | 0.004148 | -0.004344 | 0.065993 | 0.013980 | 0.005459 | 0.027858 | 0.021484 | 0.007164 | -0.000983 | -0.004696 | -0.001663 |
| 2010-01-07 | -0.006251 | -0.002465 | -0.008896 | -0.000291 | -0.033783 | 0.000515 | -0.000796 | -0.014432 | 0.009360 | -0.010077 | -0.016720 | -0.034287 | -0.031176 |
| 2010-01-08 | 0.001089 | 0.004501 | 0.007648 | -0.001164 | -0.009817 | 0.007469 | 0.009555 | -0.016786 | 0.006818 | 0.013174 | -0.011503 | -0.000652 | -0.004667 |
| 2010-01-11 | -0.002779 | 0.010982 | -0.009181 | -0.009030 | -0.051313 | 0.015148 | -0.005846 | -0.028333 | 0.012190 | -0.001182 | -0.008601 | -0.006845 | -0.011106 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2025-10-27 | -0.003089 | -0.028288 | 0.013732 | -0.021070 | 0.041768 | -0.009725 | -0.001196 | -0.034068 | -0.037518 | 0.012995 | 0.009946 | 0.013941 | 0.024478 |
| 2025-10-28 | -0.018920 | -0.008921 | -0.020073 | -0.005790 | -0.028181 | -0.000887 | 0.002499 | -0.006224 | 0.012091 | 0.007580 | -0.010045 | 0.027834 | 0.010307 |
| 2025-10-29 | 0.005486 | 0.004412 | 0.015541 | 0.017143 | 0.009268 | 0.009068 | 0.025192 | 0.003479 | 0.012647 | 0.004630 | -0.001990 | 0.007178 | 0.001855 |
| 2025-10-30 | 0.001488 | 0.004418 | 0.014726 | 0.016746 | 0.171801 | 0.010683 | 0.015048 | -0.009709 | 0.014815 | -0.008641 | -0.010167 | 0.022352 | 0.010183 |
| 2025-10-31 | 0.006769 | -0.004773 | -0.011707 | 0.005632 | 0.042467 | -0.023938 | -0.005141 | 0.010504 | -0.008962 | 0.002905 | -0.019537 | 0.019011 | 0.007789 |
3972 rows × 13 columns
1. Commodity Returns and Factors#
1.1 Factor Summary Statistics#
For each of the three factors, report only the following summary statistics (rounded to at least 6 decimal places):
Annualized Mean Return
Annualized Volatility
Annualized Sharpe Ratio
sol_11 = pd.concat(
[
factors.mean() * ANN_FACTOR,
factors.std() * np.sqrt(ANN_FACTOR),
(factors.mean() / factors.std()) * np.sqrt(ANN_FACTOR),
],
axis=1,
keys=["Annualized Return", "Annualized Volatility", "Annualized Sharpe Ratio"],
)
sol_11
| Annualized Return | Annualized Volatility | Annualized Sharpe Ratio | |
|---|---|---|---|
| LVL | 0.064439 | 0.154613 | 0.416775 |
| HMS | 0.008842 | 0.226259 | 0.039078 |
| IMO | 0.007281 | 0.153281 | 0.047500 |
1.2 Factor Correlations#
Calculate and report the correlation matrix between the three factors (rounded to at least 6 decimal places).
sol_12 = factors.corr()
sol_12
| LVL | HMS | IMO | |
|---|---|---|---|
| LVL | 1.000000 | 0.400202 | 0.010439 |
| HMS | 0.400202 | 1.000000 | -0.071747 |
| IMO | 0.010439 | -0.071747 | 1.000000 |
1.3 Interpretation#
Does the factor construction make sense given the correlations you observe?
Overall, yes. The correlations across the factors are low, IMO in particular is very uncorrelated to the other factors. The only worrying thing is that the LVL and HMS factors are ~40% correlated.
1.4 Tangency Portfolio Weights#
Build a tangency portfolio using the three factors as assets.
Report the weights of each factor in the tangency portfolio, rounded to at least 6 decimal places. You may assume that the factors use excess return.
def tangency_weights(returns, scale_cov=1):
covmat_full = returns.cov()
covmat_diag = np.diag(np.diag(covmat_full))
covmat = scale_cov * covmat_full + (1 - scale_cov) * covmat_diag
weights = np.linalg.solve(covmat, returns.mean())
weights = weights / weights.sum()
return pd.DataFrame(weights, index=returns.columns, columns=["Weight"])
sol_14 = tangency_weights(factors)
sol_14
| Weight | |
|---|---|
| LVL | 1.171917 |
| HMS | -0.250927 |
| IMO | 0.079011 |
1.5 Interpretation#
What do the tangency portfolio weights suggest about the relative importance of each factor?
The tangency portfolio implies that the most important factor by far is LVL, followed by HMS. IMO is less important but also doesn’t have a 0 weight, indicating is contributes (albeit more marginally than the other two).
1.6.#
Estimate an autoregression of the factor LVL…
Only report \(\rho\) (rounded to at least 6 decimal places).
Does the LVL factor exhibit momentum?
lvl = factors["LVL"]
lvl_lag = lvl.shift(1)
sol_16 = sm.OLS(lvl, sm.add_constant(lvl_lag), missing="drop").fit().params["LVL"]
print(f"Rho: {sol_16:.6f}")
Rho: 0.017041
\(\rho\) is 0.017, which is positive, indicating that the LVL factor exhibits momentum.
2. Single Factor Model#
2.1 LVL Factor#
We want to test the hypothesis that:
Regress each commodity’s returns against the LVL factor, and report the mean absolute alpha, \(\bar{\alpha}\) and \(r^2\) across all commodities.
Annualize the alpha.
Output exactly the following two numbers rounded to 6 decimal places:
Mean Absolute Annualized Alpha across all commodities
Mean R-squared across all commodities
sol_21_dict = {"Annualized Alpha": [], "R-Squared": [], "Beta": []}
for asset in returns.columns:
model = sm.OLS(returns[asset], sm.add_constant(factors["LVL"])).fit()
sol_21_dict["Annualized Alpha"].append(model.params["const"] * ANN_FACTOR)
sol_21_dict["R-Squared"].append(model.rsquared)
sol_21_dict["Beta"].append(model.params["LVL"])
sol_21 = pd.DataFrame(sol_21_dict, index=returns.columns)
sol_21[["Annualized Alpha", "R-Squared"]].abs().mean().to_frame("Summary")
| Summary | |
|---|---|
| Annualized Alpha | 0.034876 |
| R-Squared | 0.259890 |
2.2. Interpretation#
If our hypothesis were true, what would you expect the values of \(\bar{\alpha}\) and \(\bar{r^2}\) to be?
If this model were true, then \(\bar{\alpha}\) should be zero (specifically, all alphas should be zero), and we don’t care about the \(r^2\).
2.3 Cross-Sectional Test#
Let’s test the one-factor LVL model directly. From 2.1, we already have what we need:
The dependent variable, (y): mean excess returns from each of the commodities.
The regressor, (x): the market beta for each commodity from the time-series regressions.
Then we can estimate the following equation:
Report exactly the following 3 numbers (rounded to at least 6 decimal places):
The R-squared of this regression
The intercept estimate, \(\hat{\eta}\)
The regression coefficient \(\lambda_{LVL}\)
means = returns.mean()
model_23 = sm.OLS(means, sm.add_constant(sol_21["Beta"])).fit()
sol_23 = pd.DataFrame(
{
"R-Squared": model_23.rsquared,
"Eta": model_23.params["const"],
"Lambda_LVL": model_23.params["Beta"],
},
index=["Value"],
)
sol_23.T
| Value | |
|---|---|
| R-Squared | 0.093357 |
| Eta | 0.000163 |
| Lambda_LVL | 0.000092 |
2.4.#
Does your time-series or cross-sectional estimate give a higher premium to the LVL factor?
Note that the cross-sectional premium is the \(\lambda_{\text{LVL}}\), and the time-series estimate is just the mean return. The annualized mean return of LVL is 0.064439, whereas the annualized LVL lambda is 0.023302. As such, we conclude that the cross-sectional premium is smaller than the time-series premium.
3. Trading the Model#
3.1 Beta Estimation#
For each commodity, report their LVL beta rounded to at least 6 decimal places.
Display a table of these betas, sorted from lowest to highest.
Remember#
We estimated the betas in the time-series regression in 2.1.
Hint#
Use df.sort_values(by=<YOUR_DF>, ascending=True) to sort your results.
sol_21.sort_values(by="Beta", ascending=True)[["Beta"]]
| Beta | |
|---|---|
| LE | 0.243876 |
| GC | 0.441604 |
| ZM | 0.698967 |
| SB | 0.742221 |
| ZS | 0.770218 |
| ZC | 0.823626 |
| PL | 0.834771 |
| ZL | 0.850391 |
| SI | 1.038445 |
| NG | 1.407854 |
| HO | 1.533688 |
| RB | 1.695943 |
| CL | 1.918395 |
3.2 Portfolio Formation#
Regardless of your answer to 3.1, allocate your portfolio as follows:
Go long
GC,LE, andZMGo short
CL,HO, andRB
Go long 1 and short 0.25.
That is, your portfolio should be: $\( r_{port} = 1 \cdot \left( \frac{1}{3} \cdot r_{GC} + \frac{1}{3} \cdot r_{LE} + \frac{1}{3} \cdot r_{ZM} \right) - 0.25 \cdot \left(\frac{1}{3} \cdot r_{CL} + \frac{1}{3} \cdot r_{HO} + \frac{1}{3} \cdot r_{RB}\right) \)$ Report the last 5 daily returns of your betting against beta portfolio, rounded to at least 6 decimal places.
sol_32 = (
1 * (returns[["GC", "LE", "ZM"]].mean(axis=1))
- 0.25 * (returns[["CL", "HO", "RB"]].mean(axis=1))
).to_frame("BAB Portfolio")
sol_32.tail(5)
| BAB Portfolio | |
|---|---|
| Date | |
| 2025-10-27 | -0.012593 |
| 2025-10-28 | 0.007415 |
| 2025-10-29 | 0.005726 |
| 2025-10-30 | 0.011900 |
| 2025-10-31 | 0.007463 |
3.3 Performance Evaluation#
For your portfolio, report the following performance statistics (rounded to at least 6 decimal places):
Annualized Return
Annualized Volatility
Annualized Sharpe Ratio
sol_33 = pd.concat(
[
sol_32.mean() * ANN_FACTOR,
sol_32.std() * np.sqrt(ANN_FACTOR),
np.sqrt(ANN_FACTOR) * sol_32.mean() / sol_32.std(),
],
axis=1,
)
sol_33.columns = ["Annualized Mean", "Annualized Volatility", "Annualized Sharpe"]
sol_33.T
| BAB Portfolio | |
|---|---|
| Annualized Mean | 0.053383 |
| Annualized Volatility | 0.144502 |
| Annualized Sharpe | 0.369425 |
3.4#
For your portfolio, test the hypothesis that its premium can be explained by the LVL factor alone:
Report exactly the following 3 numbers (rounded to at least 6 decimal places):
Annualized Alpha
LVLBetaR-squared for the regression
model_34 = sm.OLS(sol_32, sm.add_constant(lvl), missing="drop").fit()
sol_34 = pd.DataFrame(
{
"Annualized Alpha": model_34.params["const"] * ANN_FACTOR,
"LVL Beta": model_34.params["LVL"],
"R-Squared": model_34.rsquared,
},
index=["Summary"],
)
sol_34.T
| Summary | |
|---|---|
| Annualized Alpha | 0.051290 |
| LVL Beta | 0.032480 |
| R-Squared | 0.001208 |
4. Multi-Factor Model#
We now want to test a multi-factor model using LVL and HMS and IMO as factors:
4.1 Time Series Test#
Estimate the time series test of this pricing model. Regress each commodity’s returns against the three factors, and report the following for each commodity (rounded to at least 6 decimal places):
Annualized Alpha
LVL,HMS, andIMOBetasR-squared
sol_41_dict = {
"Annualized Alpha": [],
"LVL Beta": [],
"HMS Beta": [],
"IMO Beta": [],
"R-Squared": [],
}
for asset in returns.columns:
model = sm.OLS(returns[asset], sm.add_constant(factors), missing="drop").fit()
sol_41_dict["Annualized Alpha"].append(model.params["const"] * ANN_FACTOR)
sol_41_dict["LVL Beta"].append(model.params["LVL"])
sol_41_dict["HMS Beta"].append(model.params["HMS"])
sol_41_dict["IMO Beta"].append(model.params["IMO"])
sol_41_dict["R-Squared"].append(model.rsquared)
sol_41 = pd.DataFrame(sol_41_dict, index=returns.columns)
sol_41
| Annualized Alpha | LVL Beta | HMS Beta | IMO Beta | R-Squared | |
|---|---|---|---|---|---|
| CL | -0.035599 | 1.527030 | 0.656709 | 0.653682 | 0.637664 |
| GC | 0.071595 | 0.373286 | 0.123614 | -0.393933 | 0.348973 |
| HO | -0.023058 | 1.205122 | 0.543109 | 1.014183 | 0.759600 |
| LE | 0.059421 | 0.328136 | -0.148049 | 0.236226 | 0.119500 |
| NG | 0.080357 | 1.035345 | 0.662592 | -1.501411 | 0.373050 |
| PL | -0.009842 | 0.711887 | 0.217593 | -0.439602 | 0.364752 |
| RB | -0.015995 | 1.243184 | 0.749484 | 1.335847 | 0.786582 |
| SB | -0.053905 | 1.065017 | -0.542152 | -0.510662 | 0.321713 |
| SI | 0.056488 | 0.901595 | 0.246072 | -0.701687 | 0.434255 |
| ZC | -0.031637 | 1.246405 | -0.717257 | -0.262703 | 0.514722 |
| ZL | -0.026659 | 1.055894 | -0.356494 | 0.316702 | 0.436060 |
| ZM | -0.028435 | 1.159886 | -0.789761 | 0.154944 | 0.511819 |
| ZS | -0.042733 | 1.147212 | -0.645459 | 0.098414 | 0.707169 |
4.2#
Report the annualized Sharpe ratio (rounded to at least 6 decimal places) of the tangency portfolio formed from
the individual commodities.
the three factors
What should be true of the Sharpe ratios if the factor pricing model is accurate?
wttan_commodities = tangency_weights(returns=returns)
rets_commodities = returns.values @ wttan_commodities.values
sharpe_commodities = rets_commodities.mean() / rets_commodities.std() * np.sqrt(252)
wttan_factors = tangency_weights(returns=factors)
rets_factors = factors.values @ wttan_factors.values
sharpe_factors = rets_factors.mean() / rets_factors.std() * np.sqrt(252)
sol_42 = pd.DataFrame(
{"Commodities": sharpe_commodities, "Factors": sharpe_factors},
index=["Annualized Sharpe Ratio"],
)
sol_42
| Commodities | Factors | |
|---|---|---|
| Annualized Sharpe Ratio | 0.852347 | 0.440656 |
If the pricing model were accurate, then the factors would have the highest Sharpe Ratio, since if it were true there would be no risk premia to be earned outside of the factors (ie. you have no excess return and only incur variance). This tells us that the pricing model is not accurate.
4.3 Cross-Sectional Test#
Run the cross-sectional test of this multi-factor model:
Report exactly the following numbers (rounded to at least 6 decimal places):
\(\lambda_{0}\)
\(\lambda_{LVL}\)
\(\lambda_{HMS}\)
\(\lambda_{IMO}\)
\(r^2\) of the cross-sectional regression
MAE of the cross-sectional regression
Annualize the MAE.
sol_43_model = sm.OLS(
returns.mean(),
sm.add_constant(sol_41[["LVL Beta", "HMS Beta", "IMO Beta"]]),
missing="drop",
).fit()
sol_43 = pd.DataFrame(
{
"Lambda_0": sol_43_model.params["const"],
"Lambda_LVL": sol_43_model.params["LVL Beta"],
"Lambda_HMS": sol_43_model.params["HMS Beta"],
"Lambda_IMO": sol_43_model.params["IMO Beta"],
"R-squared": sol_43_model.rsquared,
"MAE of residuals": np.mean(np.abs(sol_43_model.resid)),
},
index=["Summary"],
)
sol_43.T
| Summary | |
|---|---|
| Lambda_0 | 0.000321 |
| Lambda_LVL | -0.000065 |
| Lambda_HMS | 0.000200 |
| Lambda_IMO | -0.000067 |
| R-squared | 0.646543 |
| MAE of residuals | 0.000062 |
4.4 Interpretation#
Do the results of the cross-sectional test support the multi-factor model?
No! The \(r^2\) is only 0.65. If the multi-factor model were accurate, this value should be 1.
4.5 Risk Premia#
Compare the risk premia (\(\lambda\)’s) from the cross-sectional test to the average returns of each factor. Report exactly the following table (rounded to at least 6 decimal places):
Average return of each factor
Risk premia from the cross-sectional test
Annualize the estimates.
sol_45_premia = sol_43.loc["Summary", ["Lambda_LVL", "Lambda_HMS", "Lambda_IMO"]].T
sol_45_avg = factors.mean()
sol_45 = pd.concat(
[sol_45_avg.reset_index(drop=True), sol_45_premia.reset_index(drop=True)], axis=1
)
sol_45.index = factors.columns
sol_45.columns = ["Average Returns", "Risk Premia"]
sol_45 = sol_45 * ANN_FACTOR
sol_45
| Average Returns | Risk Premia | |
|---|---|---|
| LVL | 0.064439 | -0.016413 |
| HMS | 0.008842 | 0.050369 |
| IMO | 0.007281 | -0.016984 |
4.6 Interpretation#
What do you observe from the comparison of average returns and risk premia?
Theoretically, what could cause the risk premium of a factor to deviate from its average return?
For LVL and IMO, the average returns are of opposite signs of their risk premia. Both have positive average returns (6.4% and 0.7%, respectively), but their risk premia are negative (-1.6% and -1.7%). For HMS, the risk premia is higher than the average return, the return is 0.8%, but the risk premia is 5%.
Factor returns are not always compensation for their own unique risk!