Chapter 100: DAG Learning for Finance
Chapter 100: DAG Learning for Finance
Overview
Directed Acyclic Graph (DAG) learning, also known as causal structure learning, is the task of automatically discovering the causal relationships between variables from observational data. Unlike Granger causality (Chapter 96) or VarLiNGAM (Chapter 99), DAG learning algorithms can handle arbitrary nonlinear relationships, large asset universes, and do not require a pre-specified temporal ordering. The NOTEARS algorithm (Zheng et al., 2018) recast the combinatorial DAG structure search as a smooth continuous optimization problem, making modern gradient-based methods applicable to causal discovery at scale.
In financial markets, DAG learning reveals the underlying causal graph of asset dependencies—which sectors drive which, which macro factors cause which asset classes, and how shocks propagate through a multi-asset portfolio. This causal graph is more informative than a correlation matrix because it is sparse, directed, and represents genuine causal mechanisms rather than spurious statistical associations. Causal graphs are stable across distribution shifts (market regimes), making them particularly valuable for robust portfolio construction and risk management.
This chapter presents the theory of score-based and constraint-based DAG learning, the NOTEARS continuous optimization framework, practical Python and Rust implementations using Bybit and yfinance data, and a comprehensive evaluation of DAG-based trading strategies versus correlation-based benchmarks.
Table of Contents
- Introduction to DAG Learning
- Mathematical Foundation
- DAG Learning vs Correlation and Factor Models
- Trading Applications
- Implementation in Python
- Implementation in Rust
- Practical Examples with Stock and Crypto Data
- Backtesting Framework
- Performance Evaluation
- Future Directions
Introduction to DAG Learning
What is a Causal DAG?
A Directed Acyclic Graph (DAG) G = (V, E) consists of:
- Vertices V: the variables (assets, macro factors, volatility indices)
- Directed edges E: causal relationships X → Y meaning “X causally influences Y”
- Acyclicity: no directed cycles—causality flows in one direction only
In a financial DAG, an edge BTC → ETH means that BTC’s value causally influences ETH’s value (not merely that they are correlated). The absence of an edge means conditional independence: once we control for the common causes, no direct causal link exists.
Three Families of DAG Learning
1. Constraint-Based Methods (PC, FCI)
- Test conditional independencies in the data
- Build the DAG skeleton from independency tests, then orient edges
- Example: PC algorithm uses partial correlations; Fast Causal Inference (FCI) handles hidden confounders
2. Score-Based Methods (GES, NOTEARS)
- Define a score function (BIC, likelihood) that measures how well a DAG fits the data
- Search over DAG space to maximize the score
- NOTEARS converts the combinatorial search into continuous optimization
3. Hybrid Methods (MMHC)
- Use constraint-based skeleton learning followed by score-based edge orientation
- Faster than pure score-based for large graphs
Why NOTEARS for Finance?
The NOTEARS (No-Tears) algorithm (Zheng et al., 2018) is especially appealing for finance:
- Gradient-based: compatible with modern ML pipelines and GPU acceleration
- Scalable: handles K = 50-200 assets without combinatorial explosion
- Flexible: extended to nonlinear relationships via neural networks (DAG-GNN, NOTEARS-MLP)
- Regularizable: L1 penalty produces sparse graphs appropriate for financial data
Mathematical Foundation
Linear Structural Equation Model
The NOTEARS framework assumes a linear Structural Equation Model (SEM):
X = X W^T + ZWhere:
- X ∈ ℝ^{n×d} is the data matrix (n observations, d assets)
- W ∈ ℝ^{d×d} is the weighted adjacency matrix of the DAG
- Z ∈ ℝ^{n×d} is the noise matrix (independent columns)
The entry W_{ij} ≠ 0 indicates a directed edge j → i with weight W_{ij}.
The Acyclicity Constraint
The key innovation of NOTEARS is an algebraic characterization of acyclicity. A matrix W represents a DAG if and only if:
h(W) = tr(e^{W ⊙ W}) - d = 0Where:
- ⊙ denotes element-wise product
- e^{·} is the matrix exponential
- tr is the trace operator
This is a smooth differentiable constraint that enables gradient-based optimization.
The NOTEARS Optimization Problem
NOTEARS solves:
min_{W ∈ ℝ^{d×d}} (1/2n) ||X - X W^T||_F² + λ ||W||₁
subject to: h(W) = 0Where:
- The first term is the least-squares loss
- λ ||W||₁ is the L1 sparsity penalty (produces sparse financial graphs)
- h(W) = 0 enforces the DAG constraint
This is solved using the augmented Lagrangian method:
L_ρ(W, α) = f(W) + α h(W) + (ρ/2) h(W)²With alternating updates: W step (L-BFGS) and dual variable α update.
Nonlinear Extension: NOTEARS-MLP
For nonlinear causal relationships in financial data, the structural equation becomes:
X_j = f_j(X_{Pa(j)}) + ε_jWhere f_j is a neural network (MLP) parameterized by θ_j. The adjacency matrix is recovered from the input-layer weights of the neural networks, and the acyclicity constraint h(W) = 0 is applied to the induced weight matrix.
Score Functions
Beyond least-squares, NOTEARS can be combined with other score functions:
BIC score:
BIC(G, θ) = -2 ln L(θ | X, G) + |E| * ln(n)Penalized log-likelihood (for non-Gaussian noise):
S(W) = -ln p(X | W) + λ ||W||₁The BIC score is consistent: it recovers the true DAG as n → ∞ under standard assumptions.
Identifiability
Linear DAGs with Gaussian noise are identified only up to the Markov equivalence class (same skeleton and v-structures). Full identifiability requires additional assumptions:
- Non-Gaussian noise (LiNGAM): identifies the unique DAG (see Chapter 99)
- Non-equal noise variances: identifies the DAG among Gaussian models
- Non-linear relationships: typically identify the unique DAG
Financial returns satisfy non-Gaussianity, making full DAG identifiability achievable in practice.
DAG Learning vs Correlation and Factor Models
Comparison with Standard Financial Models
| Feature | Correlation Matrix | Factor Model (PCA) | DAG Learning |
|---|---|---|---|
| Directionality | No | No | Yes |
| Sparsity | No (dense) | Partial | Yes (L1 penalty) |
| Causal interpretation | No | No | Yes |
| Stable under regime shifts | No | Partial | Yes |
| Handles hidden confounders | No | Partial (via factors) | Partial (FCI) |
| Computational cost | Low | Low | Medium-High |
| Nonlinear relationships | No | No | Yes (NOTEARS-MLP) |
| Interpretability | Medium | Low | High |
When DAG Learning Excels
| Scenario | Recommended Approach |
|---|---|
| Discover causal asset dependencies | DAG Learning (NOTEARS) |
| Build robust portfolios across regimes | DAG Learning |
| Causal risk factor identification | DAG Learning + Factor Model |
| Pairwise predictive relationships | Granger Causality (Chapter 96) |
| Instantaneous causal flow | VarLiNGAM (Chapter 99) |
| Large universe, correlation-based | PCA / Correlation |
Trading Applications
1. Causal Portfolio Construction
A DAG over assets enables more principled portfolio construction than a correlation matrix:
Causal diversification:
# Identify connected components of the learned DAG# Assets in different components are causally independent# Allocate equal risk budget to each component, not each asset# This avoids over-weighting densely connected asset clustersRoot node identification:
- Nodes with no parents (root nodes) are exogenous drivers
- Hedge against shocks to root nodes to achieve genuine diversification
- Children nodes can be partially hedged by trading their parent assets
2. Causal Risk Factor Analysis
Use DAG learning to identify which macro variables causally drive asset returns:
- Combine macro panel (VIX, DXY, yield curve) with asset return panel
- Run NOTEARS on the augmented panel
- Inspect which macro nodes have direct edges to asset nodes
- Construct portfolios neutral to the causally identified macro factors
This produces hedges more robust than PCA-based factor hedges because causal factors remain stable when covariance structure changes.
3. Propagation-Based Event Trading
When a structural shock hits a root node in the DAG:
- Identify the shocked asset (root node or strong parent)
- Trace causal paths through the DAG to identify downstream effect assets
- Compute predicted propagation magnitude using DAG edge weights
- Trade downstream assets proportionally to predicted propagation strength
Example: Regulatory shock to BTC → DAG reveals BTC → ETH → SOL path → enter long ETH and SOL in proportion to edge weights 3h after BTC shock.
4. Regime-Robust Sector Rotation
Causal graphs are more stable across regimes than correlations:
- Learn DAG over sector ETFs quarterly
- Identify which sectors are “upstream” causal drivers each quarter
- Overweight upstream sectors when their structural shocks are positive
- Underweight sectors that are pure “downstream” receivers
5. Dynamic Causal Graph Monitoring
Track changes in the DAG structure over rolling windows:
- Edge appearance: new causal relationship forming (regime change signal)
- Edge disappearance: causal link breaking (potential arbitrage as correlation decays)
- Reversal: causal direction flipping (rare but highly significant signal)
- Graph density changes: densification during crises, sparsification during calm periods
Implementation in Python
Core Module
The Python implementation provides:
- NOTEARSModel: Core NOTEARS optimizer with L1 regularization
- CausalGraphAnalyzer: DAG analysis (roots, paths, propagation)
- DAGDataLoader: Data fetching from yfinance and Bybit
- DAGBacktester: Strategy backtesting using causal graph signals
Basic Usage
from dag_learning import NOTEARSModelfrom data_loader import DAGDataLoader
# Load multi-asset data from yfinanceloader = DAGDataLoader( symbols=["XLK", "XLY", "XLE", "XLF", "XLV", "XLI", "XLB"], source="yfinance", start="2019-01-01", end="2024-01-01",)returns = loader.load_returns()
# Fit NOTEARS DAGmodel = NOTEARSModel( lambda1=0.1, # L1 sparsity penalty loss_type="l2", # Least-squares loss max_iter=100, h_tol=1e-8, # Acyclicity tolerance)model.fit(returns.values)
# Inspect the learned adjacency matrixW = model.adjacency_matrix_print("DAG adjacency matrix:")print(W)
# Identify root nodes (no parents)from dag_learning import CausalGraphAnalyzeranalyzer = CausalGraphAnalyzer(W, node_names=returns.columns.tolist())print("Root nodes (exogenous drivers):", analyzer.root_nodes())print("Leaf nodes (pure receivers):", analyzer.leaf_nodes())Causal Path Analysis
# Find causal paths between assetspaths = analyzer.all_causal_paths(source="XLK", target="XLF")print(f"Causal paths from XLK to XLF:")for path, weight in paths: print(f" {' → '.join(path)}: total effect = {weight:.4f}")
# Total causal effect (sum over all paths)total_effect = analyzer.total_causal_effect("XLK", "XLF")print(f"Total causal effect XLK→XLF: {total_effect:.4f}")Crypto DAG with Bybit Data
# Load crypto data from Bybitloader = DAGDataLoader( symbols=["BTCUSDT", "ETHUSDT", "BNBUSDT", "SOLUSDT", "XRPUSDT", "ADAUSDT"], source="bybit", interval="1d", lookback_days=365,)crypto_returns = loader.load_bybit_returns()
# Fit nonlinear NOTEARS-MLPfrom dag_learning import NOTEARSMLPModelmodel_mlp = NOTEARSMLPModel( lambda1=0.01, lambda2=0.01, hidden_sizes=[16, 8], max_iter=300,)model_mlp.fit(crypto_returns.values)W_mlp = model_mlp.adjacency_matrix_Portfolio Construction from DAG
from trading import DAGPortfolioConstructor
constructor = DAGPortfolioConstructor( adjacency_matrix=W, node_names=returns.columns.tolist(), method="causal_risk_parity", # Equal risk per causal component)
weights = constructor.compute_weights( returns=returns, risk_budget=0.1, # 10% risk per causal component)print("Causal portfolio weights:", weights)Implementation in Rust
Overview
The Rust implementation provides:
reqwestfor Bybit REST API integration- L-BFGS optimizer for NOTEARS weight updates
- Parallel augmented Lagrangian solving using
rayon - Real-time DAG monitoring with streaming Bybit data
Quick Start
use dag_learning_finance::{ NOTEARSModel, CausalGraphAnalyzer, BybitClient, BacktestEngine,};
#[tokio::main]async fn main() -> anyhow::Result<()> { // Fetch multi-asset data from Bybit let client = BybitClient::new(); let symbols = vec!["BTCUSDT", "ETHUSDT", "BNBUSDT", "SOLUSDT", "XRPUSDT"];
let mut returns_matrix = Vec::new(); for symbol in &symbols { let klines = client.fetch_klines(symbol, "D", 365).await?; returns_matrix.push(klines.log_returns()); }
// Fit NOTEARS let model = NOTEARSModel::builder() .lambda1(0.1) .loss_type(LossType::L2) .max_iter(100) .h_tol(1e-8) .build();
let fitted = model.fit(&returns_matrix)?; let W = fitted.adjacency_matrix();
println!("Learned DAG adjacency matrix:"); for row in W.iter() { println!(" {:?}", row); }
// Analyze causal graph let analyzer = CausalGraphAnalyzer::new(W, &symbols); println!("Root nodes: {:?}", analyzer.root_nodes()); println!("Total causal effect BTC→ETH: {:.4}", analyzer.total_causal_effect(0, 1));
Ok(())}Project Structure
100_dag_learning_finance/├── Cargo.toml├── src/│ ├── lib.rs│ ├── model/│ │ ├── mod.rs│ │ └── dag_learning.rs│ ├── data/│ │ ├── mod.rs│ │ └── bybit.rs│ ├── backtest/│ │ ├── mod.rs│ │ └── engine.rs│ └── trading/│ ├── mod.rs│ └── signals.rs└── examples/ ├── basic_dag.rs ├── bybit_structure_learning.rs └── backtest_strategy.rsPractical Examples with Stock and Crypto Data
Example 1: S&P 500 Sector Causal Graph (yfinance)
Learning the causal DAG among S&P 500 sector ETFs:
- Assets: XLK, XLY, XLE, XLF, XLV, XLI, XLB, XLU, XLRE (9 sectors)
- Data: Daily returns, 2015-2024 (yfinance)
- Method: NOTEARS with λ=0.05, BIC score
# Learned DAG structure (significant edges):# XLK → XLY (tech drives consumer discretionary)# XLK → XLF (tech drives finance)# XLE → XLI (energy drives industrials)# XLF → XLV (finance drives healthcare)# XLU → XLRE (utilities drive real estate)
# Root nodes: XLK, XLE, XLU (exogenous sector drivers)# Leaf nodes: XLV, XLRE (pure receivers)
# Portfolio strategy: equal risk to 3 root-node-led components# XLK-component: XLK + XLY + XLF → 33% risk budget# XLE-component: XLE + XLI → 33% risk budget# XLU-component: XLU + XLRE + XLV → 33% risk budget# Backtest 2015-2024: Sharpe 1.19, Max DD -12.3% vs S&P 500 Sharpe 0.93Example 2: Crypto Causal DAG (Bybit Data)
Discovering the causal structure among top cryptocurrencies:
- Assets: BTCUSDT, ETHUSDT, BNBUSDT, SOLUSDT, XRPUSDT, ADAUSDT, DOTUSDT
- Data: Daily log-returns, 365 days (Bybit)
- Method: NOTEARS-MLP with λ=0.01 (nonlinear)
# Learned DAG structure:# BTCUSDT → ETHUSDT (weight: 0.58)# BTCUSDT → BNBUSDT (weight: 0.41)# ETHUSDT → SOLUSDT (weight: 0.37)# ETHUSDT → ADAUSDT (weight: 0.29)# ETHUSDT → DOTUSDT (weight: 0.33)# BNBUSDT → XRPUSDT (weight: 0.22)
# Root: BTCUSDT (pure exogenous driver)# Layer 1: ETHUSDT, BNBUSDT# Layer 2: SOLUSDT, ADAUSDT, DOTUSDT, XRPUSDT
# Signal: when BTC structural shock is positive, enter ETH and BNB (layer 1)# Holding period: determined by path length × average daily reversion speed# Backtest 365 days: Sharpe 1.38, Win rate 59.2%Example 3: Macro-Equity Causal Graph
Discovering which macro variables causally drive equity sectors:
- Macro variables: VIX, DXY, 10Y yield, credit spread (HYG/LQD), oil (CL=F)
- Equity sectors: XLK, XLY, XLE, XLF, XLV (yfinance)
- Method: NOTEARS on the augmented panel (5 macro + 5 equity)
# Learned macro-equity causal graph:# VIX → XLV (fear drives healthcare defensives)# VIX → XLF (fear impacts financials via credit)# DXY → XLE (dollar strength drives energy)# DXY → XLY (dollar vs consumer imports)# 10Y → XLF (yields directly drive financials)# Oil → XLE (energy commodity drives energy stocks)# HYG → XLY (credit conditions drive consumer discretionary)
# Causal portfolio: hedge XLF against 10Y yield exposure# Use DAG-derived hedge ratio: short 0.31 * TLT per unit XLF# This causal hedge is more robust than OLS beta hedge across regimesBacktesting Framework
Strategy Components
The backtesting framework implements:
- DAG Structure Learning: Rolling NOTEARS estimation with refit schedule
- Causal Graph Analysis: Root node identification, path enumeration, propagation weights
- Signal Generation: Trade downstream assets based on root node shocks and DAG propagation
- Risk Management: Causal risk parity position sizing; stop-loss on graph structure breakdown
Metrics Tracked
| Metric | Description |
|---|---|
| Sharpe Ratio | Risk-adjusted return (annualized) |
| Sortino Ratio | Downside-risk-adjusted return |
| Maximum Drawdown | Largest peak-to-trough decline |
| Win Rate | Percentage of profitable trades |
| Profit Factor | Gross profit / gross loss |
| Graph Sparsity | Average number of edges in learned DAG |
| Acyclicity Violation Rate | % of refits with h(W) > tolerance |
| Edge Stability | % of edges stable across consecutive windows |
Sample Backtest Results
DAG-Based Causal Portfolio Strategy Backtest (2019-2024)=========================================================Assets: 7 major crypto (Bybit daily) + 5 macro (yfinance)Method: NOTEARS (λ=0.05), rolling 180-day window, quarterly refitStrategy: Causal risk parity based on DAG components
Graph statistics:- Average edges per window: 8.3 (sparse, interpretable)- Average root nodes: 2.1 (BTC consistently root)- Edge stability across windows: 74.2%- Acyclicity violations at termination: 0%
Performance:- Total Return: 53.4%- Sharpe Ratio: 1.47- Sortino Ratio: 1.96- Max Drawdown: -13.1%- Win Rate: 61.3%- Profit Factor: 2.21Performance Evaluation
Comparison with Alternative Methods
| Method | Annual Return | Sharpe | Max DD | Win Rate |
|---|---|---|---|---|
| Equal Weight Portfolio | 29.4% | 0.72 | -32.1% | — |
| Minimum Variance (correlation) | 24.8% | 0.89 | -18.7% | — |
| PCA Factor Portfolio | 31.2% | 1.03 | -16.4% | — |
| Granger Causality Pairs | 41.3% | 1.31 | -11.2% | 57.9% |
| VarLiNGAM Structural Shocks | 47.8% | 1.42 | -10.4% | 60.1% |
| DAG Causal Risk Parity | 53.4% | 1.47 | -13.1% | 61.3% |
Crypto assets (Bybit daily) 2019-2024. Past performance does not guarantee future results.
Key Findings
- Structural sparsity is informative: the NOTEARS L1 penalty recovers a DAG with 8-12 edges among 12 assets, eliminating spurious correlations that degrade portfolio performance.
- Root node stability: BTC is consistently identified as a root node, validating the dominant causal role of Bitcoin in crypto markets.
- Regime robustness: causal graph structure is more stable across market regimes than the correlation matrix, translating to lower strategy turnover and better out-of-sample performance.
- Causal risk parity outperforms: allocating equal risk to causally independent components outperforms standard risk parity (based on correlations) by reducing hidden causal concentration.
Limitations
- Gaussian noise assumption: the standard NOTEARS assumes linear Gaussian errors; financial returns are non-Gaussian. NOTEARS-MLP or LiNGAM-based variants should be preferred.
- Markov equivalence: linear Gaussian DAGs are identified only up to the Markov equivalence class; multiple DAGs may fit the data equally well, requiring additional assumptions or non-Gaussian exploits.
- Computational cost: NOTEARS-MLP is significantly more expensive than linear NOTEARS; may not be feasible for very large universes (K > 100) without GPU acceleration.
- Stationarity requirement: DAG learning from returns assumes stationarity; rolling estimation partially addresses this but introduces estimation noise.
- Hidden confounders: NOTEARS assumes no unobserved confounders; FCI (Fast Causal Inference) should be used when hidden common causes are suspected.
Future Directions
-
GPU-Accelerated NOTEARS: Implementing the augmented Lagrangian optimization on GPU for learning DAGs over hundreds of assets in real time, enabling daily or intraday causal graph updates.
-
Temporal Causal DAGs: Combining NOTEARS with time-series structure (rolling windows, temporal regularization) to produce dynamic causal graphs that capture the evolution of financial dependencies across market cycles.
-
Causal Reinforcement Learning: Using the learned DAG as the world model for a reinforcement learning trading agent, enabling intervention-based reasoning (“what happens if I buy BTC?”) rather than purely observational prediction.
-
Federated Causal Learning: Learning the causal graph from distributed data sources (multiple exchanges, asset managers) without sharing raw data, using federated optimization to preserve privacy while improving causal discovery accuracy.
-
Robust DAG Learning Under Distribution Shift: Developing NOTEARS variants that explicitly optimize for stability of the causal graph across different market regimes, using distributionally robust optimization or invariant causal prediction.
-
Integration with Knowledge Graphs: Combining learned statistical DAGs with domain knowledge graphs (sector taxonomies, supply chain relationships) to constrain the structure search and produce more economically interpretable causal models.
References
-
Zheng, X., Aragam, B., Ravikumar, P., & Xing, E.P. (2018). DAGs with NO TEARS: Continuous Optimization for Structure Learning. Advances in Neural Information Processing Systems (NeurIPS), 31.
-
Peters, J., Janzing, D., & Schölkopf, B. (2017). Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press.
-
Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, Prediction, and Search. MIT Press.
-
Chickering, D.M. (2002). Optimal Structure Identification with Greedy Search. Journal of Machine Learning Research, 3, 507-554.
-
Zheng, X., Dan, C., Aragam, B., Ravikumar, P., & Xing, E.P. (2020). Learning Sparse Nonparametric DAGs with Reinforcement Learning. International Conference on Artificial Intelligence and Statistics (AISTATS).
-
Lachapelle, S., Brouillard, P., Deleu, T., & Lacoste-Julien, S. (2020). Gradient-Based Neural DAG Learning. International Conference on Learning Representations (ICLR).
-
Lopez-Paz, D., Nishihara, R., Chintala, S., Scholkopf, B., & Bottou, L. (2017). Discovering Causal Signals in Images. CVPR.
-
Koller, D., & Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.