Data Dictionary#

This document maps notebooks to their data dependencies, build files, and data characteristics.

Auto-generated from build_data/manifest.yml - Run python scripts/generate-data-dictionary.py to update.

Legend#

Source Codes:

  • CRSP = Center for Research in Security Prices

  • BB = Bloomberg Terminal

  • DB = Databento

  • FRED = Federal Reserve Economic Data

  • Derived = Computed from other sources

Frequency Codes:

  • D = Daily

  • W = Weekly

  • M = Monthly

  • Q = Quarterly

  • Snap = Point-in-time snapshot



Build Files Reference#

Build File

Source

Years

Freq

Description

Build BB - GMO Returns

TODO: add description

Build BB - ProShares Analysis

TODO: add description

Build BB - SPX Stocks

TODO: add description

Build BB - SPY Forecasting

TODO: add description

Build PDR - Factor Pricing

TODO: add description

Build PDR - FamaFrench

TODO: add description

Build PDR - Momentum Portfolios

TODO: add description

Build PDR - Value Portfolios

TODO: add description

Build WRDS - Barnstable

TODO: add description

Build WRDS - CRSP Market

TODO: add description

Build WRDS - SPX Stocks

TODO: add description

Build Yahoo - GMO

TODO: add description

Build Yahoo - Global Indexes

TODO: add description

Build Yahoo - Multi-Asset ETFs

TODO: add description

Build Yahoo - PE Adjacent Funds

TODO: add description

Build Yahoo - Risk Assets

TODO: add description

Build Yahoo - SPY History

TODO: add description

Build Yahoo - Sector ETFs

TODO: add description

OLD Build BB - SPX Stocks

TODO: add description

Process Midterm 1

TODO: add description

Process Midterm 2

TODO: add description


Notes#

  • {DATE} = Variable date in YYYY-MM-DD format

  • {DATE_NODASH} = Variable date in YYYYMMDD format (Databento)

  • {CONTRACT} = Futures contract code (e.g., FVM5, TYU5)

  • {TAG} = Identifier tag (e.g., 3m, M2025)

  • Snap = Point-in-time snapshot, not a time series


Diagnostics#

These checks help keep build_data/manifest.yml authoritative and the repo tidy.

Referenced files missing from the repo#

File

Referenced in

Expected folder

Manifest build file

crsp_corp_fin_2013.xlsx

4.X.9. TA Discussion - CAPM, 6.X.8. TA Review - Multi-Factor Models

data

factor_pricing_data.xlsx

C.6.0. Smart Beta and Factor Investing

data

gmo_data.xlsx

7.3. TA Review - Forecasting, C.7.0. GMO Forecasting

data/ or build_data/

ltcm exhibits data.xlsx

C.8.0. LTCM

data/ or build_data/

proshares analysis data.xlsx

C.2.0. ProShares Replication

data/ or build_data/

spx_weekly_returns.xlsx

E.1.2. Unconstrained Optimization

data

Data files present in data/ but not referenced by any notebook#

Data file

Manifest build file

commodity_factors.xlsx

factor_pricing_data_monthly.xlsx

global_index_data.xlsx

gmo_returns_data.xlsx

harvard_tips_exhibits.xlsx

ltcm_exhibits_data.xlsx

market_returns_dividend_price_ratio.xlsx

midterm_1_fund_returns.xlsx

midterm_1_stock_returns.xlsx

proshares_analysis_data.xlsx

reversal_data.xlsx

spx_data_daily.xlsx

spy_forecasting_data.xlsx

Data files in data/ not covered by any manifest output pattern#

  • If these are used, add the appropriate output pattern(s) to build_data/manifest.yml.

  • barnstable_analysis_data.xlsx

  • commodity_factors.xlsx

  • crsp_market_data.xlsx

  • dfa_analysis_data.xlsx

  • factor_pricing_data_monthly.xlsx

  • factor_pricing_data_weekly.xlsx

  • global_index_data.xlsx

  • gmo_analysis_data.xlsx

  • gmo_returns.xlsx

  • gmo_returns_data.xlsx

  • gmo_returns_weekly.xlsx

  • harvard_tips_exhibits.xlsx

  • ltcm_exhibits_data.xlsx

  • market_returns_dividend_price_ratio.xlsx

  • midterm_1_fund_returns.xlsx

  • midterm_1_stock_returns.xlsx

  • momentum_data.xlsx

  • multi_asset_etf_data.xlsx

  • port_decomp_example.xlsx

  • private_equity_data.xlsx

  • proshares_analysis_data.xlsx

  • reversal_data.xlsx

  • risk_etf_data.xlsx

  • sector_etf_data.xlsx

  • single_stock_data.xlsx

  • spx_data_daily.xlsx

  • spx_data_weekly.xlsx

  • spx_returns_weekly.xlsx

  • spy_data.xlsx

  • spy_forecasting_data.xlsx

Notebook references not covered by build_data/manifest.yml#

Reference

Referenced in

barnstable_analysis_data.xlsx

C.3.0. Barnstable and Long-Run Risk

crsp_corp_fin_2013.xlsx

4.X.9. TA Discussion - CAPM, 6.X.8. TA Review - Multi-Factor Models

crsp_market_data.xlsx

7.2. Long-Horizon Prediction with Persistent Signals, 9.2. Tail Risk and Short-Term Capital Management

dfa_analysis_data.xlsx

6.X.8. TA Review - Multi-Factor Models, C.4.0. DFA and Factor Investing

factor_pricing_data.xlsx

C.6.0. Smart Beta and Factor Investing

factor_pricing_data_weekly.xlsx

E.7.2. Forecasting with LFPM’s

factor_pricing_data_{SAMPLING}.xlsx

E.6.1. Single-Stock Factor Pricing

factor_pricing_data_{TAG_FREQUENCY}.xlsx

6.X.1. Factor Models and Tangency Portfolios

gmo_analysis_data.xlsx

8.1. TA Review

gmo_data.xlsx

7.3. TA Review - Forecasting, C.7.0. GMO Forecasting

gmo_returns.xlsx

C.7.2. GMO Performance

gmo_returns_weekly.xlsx

C.7.2. GMO Performance

ltcm exhibits data.xlsx

C.8.0. LTCM

momentum_data.xlsx

6.X.9. TA Review - Momentum, C.6.1. AQR Momentum Strategies

multi_asset_etf_data.xlsx

5.1. Practical Optimization, C.1.0. Harvard’s Endowment

port_decomp_example.xlsx

E.2.1 Replicating Regressions

private_equity_data.xlsx

E.2.2. Decomposing PE

proshares analysis data.xlsx

C.2.0. ProShares Replication

risk_etf_data.xlsx

1.1. Risk and Return Metrics, 1.2. Optimizing Risk and Return, 1.X.2. MV Optimization via Regression, 2.1. Linear Factor Decomposition, 3.1. Value-at-Risk, 3.X.1. Coherent Risk Measures, 3.X.9. TA Discussion - VaR and Barnstable, 5.2. Managing Tail Risk, E.1.1. Risk Metrics

sector_etf_data.xlsx

2.2. LFD for Dimension Reduction

single_stock_data.xlsx

5.2. Managing Tail Risk

spx_data_weekly.xlsx

E.8.1. Forecasting with Fundamentals

spx_returns_weekly.xlsx

1.3. MV of S&P500, 2.2. LFD for Dimension Reduction, 3.X.9. TA Discussion - VaR and Barnstable, E.1.1. Risk Metrics, E.1.2. Unconstrained Optimization, E.1.3. Constrained Optimization, E.3.1. VaR of Equity Portfolio, E.4.0. Compensated Risk

spx_returns_{SAMPLING}.xlsx

E.6.1. Single-Stock Factor Pricing

spx_weekly_returns.xlsx

E.1.2. Unconstrained Optimization

spy_data.xlsx

9.2. Tail Risk and Short-Term Capital Management, C.8.0. LTCM