Statistics of the DFA Factors#

import numpy as np
import pandas as pd
pd.options.display.float_format = "{:,.4f}".format

import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.api as sm
from sklearn.linear_model import LinearRegression

from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (12,6)
plt.rcParams['font.size'] = 15
plt.rcParams['legend.fontsize'] = 13

This uses factor_pricing.py, which you do not have.#

Thus, this notebook is just for demonstration.#

import sys
sys.path.insert(0, '../cmds')
sys.path.insert(0, '../DEV')
from portfolio import *
from factor_pricing import *

Load Data#

filepath_data = '../data/dfa_analysis_data.xlsx'
info = pd.read_excel(filepath_data,sheet_name='descriptions')
info.rename(columns={'Unnamed: 0':'Symbol'},inplace=True)
info.set_index('Symbol',inplace=True)

facs = pd.read_excel(filepath_data,sheet_name='factors')
facs.set_index('Date',inplace=True)
rf = facs['RF'].copy()
facs.drop(columns=['RF'],inplace=True)

2. Factors#

2.1#

dts = dict()
dts['early'] = pd.date_range(start=facs.index[0], end='31/12/1980',freq='M')
dts['founding'] = pd.date_range(start='1/1/1981', end='31/12/2001',freq='M')
dts['recent'] = pd.date_range(start='1/1/2002', end=facs.index[-1],freq='M')

dts['1990s'] = pd.date_range(start='1/1/1991', end='31/12/1999',freq='M')
dts['modern'] = dts['founding'].union(dts['recent'])
dts['all'] = facs.index
/var/folders/zx/3v_qt0957xzg3nqtnkv007d00000gn/T/ipykernel_92294/4220933265.py:2: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
  dts['early'] = pd.date_range(start=facs.index[0], end='31/12/1980',freq='M')
/var/folders/zx/3v_qt0957xzg3nqtnkv007d00000gn/T/ipykernel_92294/4220933265.py:3: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
  dts['founding'] = pd.date_range(start='1/1/1981', end='31/12/2001',freq='M')
/var/folders/zx/3v_qt0957xzg3nqtnkv007d00000gn/T/ipykernel_92294/4220933265.py:4: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
  dts['recent'] = pd.date_range(start='1/1/2002', end=facs.index[-1],freq='M')
/var/folders/zx/3v_qt0957xzg3nqtnkv007d00000gn/T/ipykernel_92294/4220933265.py:6: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
  dts['1990s'] = pd.date_range(start='1/1/1991', end='31/12/1999',freq='M')
for era in dts.keys():
    print(f'\n========================================================\n')
    print(f'Period: {dts[era][0]} to {dts[era][-1]}')
    print(f'\n========================================================')
    display(performanceMetrics(facs.loc[dts[era]],annualization=12).join(tailMetrics(facs.loc[dts[era]],quantile=.05)['VaR (0.05)']))
========================================================

Period: 1926-07-31 00:00:00 to 1980-12-31 00:00:00

========================================================
Mean Vol Sharpe Min Max VaR (0.05)
Mkt-RF 0.0810 0.2050 0.3949 -0.2874 0.3881 -0.0841
SMB 0.0339 0.1143 0.2968 -0.0989 0.3596 -0.0419
HML 0.0503 0.1342 0.3749 -0.1319 0.3552 -0.0442
========================================================

Period: 1981-01-31 00:00:00 to 2001-12-31 00:00:00

========================================================
Mean Vol Sharpe Min Max VaR (0.05)
Mkt-RF 0.0779 0.1572 0.4953 -0.2319 0.1245 -0.0641
SMB -0.0020 0.1173 -0.0172 -0.1741 0.2125 -0.0459
HML 0.0646 0.1099 0.5876 -0.0977 0.1224 -0.0416
========================================================

Period: 2002-01-31 00:00:00 to 2025-08-31 00:00:00

========================================================
Mean Vol Sharpe Min Max VaR (0.05)
Mkt-RF 0.0913 0.1535 0.5947 -0.1720 0.1360 -0.0773
SMB 0.0079 0.0884 0.0897 -0.0593 0.0714 -0.0392
HML 0.0012 0.1064 0.0113 -0.1383 0.1286 -0.0415
========================================================

Period: 1991-01-31 00:00:00 to 1999-12-31 00:00:00

========================================================
Mean Vol Sharpe Min Max VaR (0.05)
Mkt-RF 0.1556 0.1298 1.1988 -0.1605 0.1085 -0.0416
SMB -0.0007 0.1042 -0.0070 -0.0694 0.0848 -0.0501
HML 0.0142 0.0922 0.1539 -0.0766 0.0651 -0.0406
========================================================

Period: 1981-01-31 00:00:00 to 2025-08-31 00:00:00

========================================================
Mean Vol Sharpe Min Max VaR (0.05)
Mkt-RF 0.0850 0.1551 0.5478 -0.2319 0.1360 -0.0723
SMB 0.0033 0.1029 0.0316 -0.1741 0.2125 -0.0418
HML 0.0310 0.1083 0.2860 -0.1383 0.1286 -0.0415
========================================================

Period: 1926-07-31 00:00:00 to 2025-08-31 00:00:00

========================================================
Mean Vol Sharpe Min Max VaR (0.05)
Mkt-RF 0.0828 0.1841 0.4495 -0.2874 0.3881 -0.0792
SMB 0.0201 0.1093 0.1839 -0.1741 0.3596 -0.0418
HML 0.0416 0.1232 0.3377 -0.1383 0.3552 -0.0424

2.2#

Regarding the factor premia, we see that…

  • SMB premium is small in most subsamples and is negative during 1981-2001.

  • HML is positive overall and in most subsamples, but it is negative in the post-case sample of 2002-2021.

  • HML’s premium drops substantially starting in the 1990’s.

  • MKT is positive in every subsample, and strongly so.

2.3#

The correlations are low relative to most equity portfolios (which tend to have high correlations.)

  • The correlation between MKT and SMB is relatively low, but definitely positive.

  • The correlation between MKT and HML is negative since DFA’s founding, but it was positive before that, such that the 100-year sample is positive.

from cmds.plot_tools import plot_corr_matrix

for era in ['founding','recent','modern','all']:
    plot_corr_matrix(facs.loc[dts[era]],figsize=(4,4),triangle='lower');
    plt.title(era);
    plt.show()
(<Figure size 400x400 with 2 Axes>,
 <Axes: title={'center': 'Correlation matrix (lower triangle)'}>)
Text(0.5, 1.0, 'founding')
../_images/41b6b6a1a21a1b7329fb925c3ad7efb78d077aaf65dae0512216a1b527fd0f21.png
(<Figure size 400x400 with 2 Axes>,
 <Axes: title={'center': 'Correlation matrix (lower triangle)'}>)
Text(0.5, 1.0, 'recent')
../_images/01630a3f73c9f8d4b533f90210eb8654b934ae9e890b13ef726359e227b438d2.png
(<Figure size 400x400 with 2 Axes>,
 <Axes: title={'center': 'Correlation matrix (lower triangle)'}>)
Text(0.5, 1.0, 'modern')
../_images/aaf80c06cc1c4c47dc1288fb70b442a4fb3939583631358872a612dab12d4139.png
(<Figure size 400x400 with 2 Axes>,
 <Axes: title={'center': 'Correlation matrix (lower triangle)'}>)
Text(0.5, 1.0, 'all')
../_images/73e841fcf35420009b395d33998a356bc5e4842e8d7410c0f71793a3e167941c.png

2.4#

display((1+facs.loc[dts['founding']]).cumprod().plot());
display((1+facs.loc[dts['recent']]).cumprod().plot());
<Axes: >
<Axes: >
../_images/152f4f16bec27519d4ed2ad0f75e5c7e321598f99b2ba392b3ceeab295a3c566.png ../_images/344212ce159ae028bf4fed26238d4d83d00fdcbc97d8b0299e2424614c346637.png
  • The plots above show that HML had high returns in the era of the case: 1981-2001.

  • In the post-case period (2002-2021) HML has a negative mean return. Nonetheless, it may be valuable given that it has negative correlation to MKT during this time.

  • However, SMB has small or negative returns in this period while being positively correlated to MKT. Thus, it may be reasonable to drop SMB.

Tangency Weights of Factors#

Another way to consider whether all three factors matter is to remember that if a factor prices, then by the Fundamental Theorem of Asset Pricing it must be part of the tangency portfolio.

Calculate the tangency portfolio formed by the factors. Do they all get substantial weight? We see that SMB has very little weight in the three-factor Tangency portfolio.

Thus, we again see that SMB may be unnecessary.

3. CAPM#

3.1#

# load data
rets = pd.read_excel(filepath_data,sheet_name='portfolios (total returns)')
rets.set_index('Date',inplace=True)
# excess portfolio returns
retsx = rets.subtract(rf,axis=0)
# subsample of modern
facsT = facs.loc[dts['modern']]
retsxT = retsx.loc[dts['modern']]
mets = performanceMetrics(retsxT,annualization=12)
tail = tailMetrics(retsxT)

display(mets.join(tail['VaR (0.05)']))
Mean Vol Sharpe Min Max VaR (0.05)
SMALL LoBM 0.0117 0.2717 0.0431 -0.3477 0.3596 -0.1249
ME1 BM2 0.0884 0.2354 0.3756 -0.3128 0.4309 -0.0949
ME1 BM3 0.0902 0.2008 0.4493 -0.2919 0.1999 -0.0848
ME1 BM4 0.1125 0.1940 0.5800 -0.2896 0.2563 -0.0776
SMALL HiBM 0.1273 0.2084 0.6110 -0.2908 0.4121 -0.0882
ME2 BM1 0.0609 0.2447 0.2490 -0.3308 0.3024 -0.1032
ME2 BM2 0.0984 0.2054 0.4790 -0.3254 0.1892 -0.0834
ME2 BM3 0.1052 0.1864 0.5640 -0.2926 0.1832 -0.0803
ME2 BM4 0.1081 0.1819 0.5942 -0.2526 0.1869 -0.0753
ME2 BM5 0.1132 0.2137 0.5298 -0.3209 0.2596 -0.0933
ME3 BM1 0.0694 0.2237 0.3103 -0.3030 0.2217 -0.0995
ME3 BM2 0.1040 0.1871 0.5555 -0.2908 0.1874 -0.0786
ME3 BM3 0.0910 0.1729 0.5264 -0.2551 0.1669 -0.0732
ME3 BM4 0.1053 0.1797 0.5863 -0.2667 0.1666 -0.0722
ME3 BM5 0.1240 0.2024 0.6125 -0.3131 0.2291 -0.0845
ME4 BM1 0.0919 0.2008 0.4575 -0.2615 0.2538 -0.0835
ME4 BM2 0.0937 0.1761 0.5323 -0.2929 0.1553 -0.0723
ME4 BM3 0.0920 0.1742 0.5282 -0.2514 0.1681 -0.0762
ME4 BM4 0.1059 0.1741 0.6081 -0.3220 0.1627 -0.0688
ME4 BM5 0.1056 0.1990 0.5307 -0.3264 0.2037 -0.0857
BIG LoBM 0.0949 0.1634 0.5807 -0.2225 0.1490 -0.0758
ME5 BM2 0.0857 0.1544 0.5552 -0.2305 0.1607 -0.0641
ME5 BM3 0.0796 0.1529 0.5211 -0.2263 0.1425 -0.0711
ME5 BM4 0.0726 0.1704 0.4258 -0.2759 0.1636 -0.0738
BIG HiBM 0.1035 0.2040 0.5076 -0.2858 0.2199 -0.0890

Explaining premia by risk metrics#

lfd = get_ols_metrics(facsT['Mkt-RF'],retsxT,annualization=12)

mets['Beta'] = lfd[['Mkt-RF']]
mets['VaR'] = tail[['VaR (0.05)']]
mets['Skew'] = tail[['Skewness']]

mets.plot.scatter(x='Vol',y='Mean');
mets.plot.scatter(x='VaR',y='Mean');
mets.plot.scatter(x='Skew',y='Mean');
mets.plot.scatter(x='Beta',y='Mean');
../_images/59d0b4a7baddafa841e1ac0fe981296f6e619e5882381ab4630367ef54882cdd.png ../_images/866ae339fdddb0d9bfefe90934d55014957810ddc2ad13c69ce2aae7dc912551.png ../_images/d9b3d02b8d81fc35f6390b95c99f5bbfd9feae76f43d5b55fc9bc44e7afee94f.png ../_images/26a4e99ed7ad38929080304f1814db39c2440e21ec5a75b794a12dbf75a88dcb.png

3.2 CAPM Tests: Time-Series Metrics#

display(lfd)
alpha Mkt-RF r-squared Treynor Ratio Info Ratio
SMALL LoBM -0.1037 1.3585 0.6014 0.0086 -0.6047
ME1 BM2 -0.0106 1.1658 0.5900 0.0759 -0.0705
ME1 BM3 0.0010 1.0495 0.6571 0.0860 0.0088
ME1 BM4 0.0295 0.9773 0.6105 0.1151 0.2435
SMALL HiBM 0.0429 0.9939 0.5476 0.1281 0.3058
ME2 BM1 -0.0524 1.3341 0.7154 0.0457 -0.4018
ME2 BM2 0.0016 1.1390 0.7401 0.0864 0.0151
ME2 BM3 0.0171 1.0357 0.7426 0.1015 0.1812
ME2 BM4 0.0251 0.9765 0.6937 0.1107 0.2493
ME2 BM5 0.0188 1.1108 0.6505 0.1019 0.1488
ME3 BM1 -0.0383 1.2677 0.7725 0.0548 -0.3589
ME3 BM2 0.0112 1.0911 0.8179 0.0953 0.1408
ME3 BM3 0.0069 0.9895 0.7879 0.0920 0.0872
ME3 BM4 0.0211 0.9911 0.7320 0.1063 0.2271
ME3 BM5 0.0344 1.0543 0.6528 0.1176 0.2883
ME4 BM1 -0.0084 1.1800 0.8308 0.0779 -0.1016
ME4 BM2 0.0038 1.0577 0.8685 0.0886 0.0600
ME4 BM3 0.0065 1.0068 0.8036 0.0914 0.0838
ME4 BM4 0.0226 0.9803 0.7630 0.1080 0.2662
ME4 BM5 0.0165 1.0484 0.6679 0.1007 0.1439
BIG LoBM 0.0103 0.9955 0.8935 0.0953 0.1926
ME5 BM2 0.0071 0.9247 0.8631 0.0927 0.1249
ME5 BM3 0.0052 0.8760 0.7904 0.0909 0.0743
ME5 BM4 -0.0048 0.9102 0.6863 0.0797 -0.0501
BIG HiBM 0.0164 1.0260 0.6087 0.1009 0.1282

If CAPM were true, then we would have the following implications:

  • Treynor ratio would be the same for every asset, and it would equal the MKT premium.

  • The alphas would all be zero

  • The Information Ratios would all be zero

3.3 Cross-Sectional Test: Yes Intercept#

LFPtests(retsxT,facsT[['Mkt-RF']],annualization=12,useIntCS=True)
Time-Series Test Plots
../_images/9bf13bcf7e10db75cde9203e4d8258b19fe465312167b1f3555fdbde25c7c85c.png ../_images/f00f1bc041545be07a278cf5eee475fe1a44c0dbc7fca0173081a28b65e774f6.png
Cross-Sectional Test Plots
../_images/e452718f3fa7c935441ca185d83a492592149d90868e0a7056217451f3cb451a.png ../_images/e879a5bfb4af84d4fb5f1d61d2cf54df62c0c48a6866855d1cc47c5c8e25314c.png
ESTIMATES
premium-TS premium-CS
Mkt-RF 0.0850 -0.1059
intercept NaN 0.2058
MODEL FIT
MAE-TS MAE-CS rsquared
error 0.0207 0.0141 0.3132
STATISTICAL SIGNIFICANCE
time-series priced premium
p-values
Mkt-RF 0.0003 0.0001
error 0.0000 NaN
"premium" p-value is the usual t-stat on the time-series factor mean.
"priced" p-value of factor is the t-stat of forming the tangency portfolio.
"priced" p-value of "error" is the joint-chi-squared test of the time-series alphas

3.4#

These results support the idea that the MKT premium is not sufficient for explaining all premia in the market.

Given that MKT cannot explain these assets, which are sorted by size and value, it is suggestive that there may be premia in size and value.

But while we have ruled out MKT as being the only determinant factor, we have not proven size and value matter.

4.1 Fama-French 3-Factor Tests#

Time-Series Test#

Cross-Sectional Test: YES Intercept#

LFPtests(retsxT,facsT,annualization=12,useIntCS=True)
Time-Series Test Plots
../_images/a2e23befe2c5727dead32f6bde89bad419a411008a6594f91a32e8adb9f57caf.png
Cross-Sectional Test Plots
../_images/b3f0738a32c91590a797cedf57016a11b1c39c5bcc0f4be817107de97ec480d1.png
ESTIMATES
premium-TS premium-CS
HML 0.0310 0.0346
Mkt-RF 0.0850 -0.0937
SMB 0.0033 -0.0025
intercept NaN 0.1800
MODEL FIT
MAE-TS MAE-CS rsquared
error 0.0141 0.0114 0.4798
STATISTICAL SIGNIFICANCE
time-series priced premium
p-values
Mkt-RF 0.0001 0.0001
SMB 0.8189 0.4163
HML 0.0098 0.0282
error 0.0000 NaN
"premium" p-value is the usual t-stat on the time-series factor mean.
"priced" p-value of factor is the t-stat of forming the tangency portfolio.
"priced" p-value of "error" is the joint-chi-squared test of the time-series alphas

4.2 Tangency Tests#

Time-Series Test#

Cross-Sectional Test: Yes Intercept#

tangency = retsxT @ tangency_weights(retsxT)
tangency.columns= ['Tangency']

LFPtests(retsxT,tangency,annualization=12,useIntCS=True)
Time-Series Test Plots
../_images/f6f0de381fa6fb3ad58aa6458652e0a925530bb8c77dd80ef5da8efcd7978712.png ../_images/38f0fcafd8d0484fe6f1fddcec7951fb98b617ac7eb77e805711fcf8c1f17897.png
Cross-Sectional Test Plots
../_images/f6f0de381fa6fb3ad58aa6458652e0a925530bb8c77dd80ef5da8efcd7978712.png ../_images/38f0fcafd8d0484fe6f1fddcec7951fb98b617ac7eb77e805711fcf8c1f17897.png
ESTIMATES
premium-TS premium-CS
Tangency 0.3584 0.3584
intercept NaN 0.0000
MODEL FIT
MAE-TS MAE-CS rsquared
error 0.0000 0.0000 1.0000
STATISTICAL SIGNIFICANCE
time-series priced premium
p-values
Tangency 0.0000 0.0000
error 1.0000 NaN
"premium" p-value is the usual t-stat on the time-series factor mean.
"priced" p-value of factor is the t-stat of forming the tangency portfolio.
"priced" p-value of "error" is the joint-chi-squared test of the time-series alphas

4.3#

The MAE#

is shown in the tables above labeled as MODEL FIT.

Joint Test of Alphas#

These test stats can be found in the tables above labeled as “STATISTICAL SIGNIFICANCE” in the “error”,”priced” cell of the table.

We note that the p-value for CAPM is 0 to many decimal places.

So is the p-value for Fama-French 3 Factor.

However, the p-value for the Tangency factor is 1, meaning that its errors are certainly not significant–because they are 0 to machine precision.

Stricter test#

Checking individual alphas for significance via a t-test may rule out a model.

But checking a group of alphas for joint significance via a chi-squared test is a higher hurdle for the model. Even if no individual alpha is significant, the group may be significant. We see this in that the joint test p-values for CAPM and FF are 0 to many decimals.

Testing the Tangency#

The test statistic can be viewed as the squared Sharpe Ratio of the tangency portfolio formed by alphas compared to the squared Sharpe Ratio of the model tangency portfolio (formed via the factors).