Chapter 7: Linear Methods for Crypto Return Prediction and Risk Decomposition
Chapter 7: Linear Methods for Crypto Return Prediction and Risk Decomposition
Overview
Linear models remain among the most powerful tools for financial analysis despite the rise of complex machine learning methods. Their interpretability, computational efficiency, and well-understood statistical properties make them indispensable for crypto return prediction and risk decomposition. Ordinary Least Squares (OLS) regression provides the foundation for factor models that decompose crypto asset returns into systematic risk exposures and idiosyncratic components, enabling portfolio managers to understand what drives returns and to hedge unwanted exposures.
The application of linear methods to cryptocurrency markets requires addressing several challenges unique to digital assets. Crypto factor models must incorporate novel factors beyond the traditional market, size, and momentum: on-chain activity (active addresses, transaction volume), network effects, and tokenomics-specific metrics. Cross-sectional regression across the altcoin universe enables estimation of risk premia for these factors, extending the Fama-Macbeth methodology to the crypto domain. However, the high dimensionality and multicollinearity of crypto features demands regularization through Ridge (L2), Lasso (L1), and Elastic Net approaches.
This chapter covers the complete spectrum of linear methods adapted for crypto trading: from OLS regression on crypto factors through regularized methods for feature selection, to logistic regression for binary direction prediction. Rolling regression analysis reveals how factor loadings shift across market regimes, providing early warning signals for correlation breakdowns. Both Python and Rust implementations are provided, with practical examples using Bybit market data and yfinance for supplementary data sources.
Table of Contents
- Introduction to Linear Methods in Crypto
- Mathematical Foundation
- Comparison of Linear Methods
- Trading Applications
- Implementation in Python
- Implementation in Rust
- Practical Examples
- Backtesting Framework
- Performance Evaluation
- Future Directions
Section 1: Introduction to Linear Methods in Crypto
Why Linear Models for Crypto?
Despite the non-linear dynamics of cryptocurrency markets, linear models offer critical advantages:
- Interpretability: Coefficients directly represent factor exposures and marginal effects
- Statistical inference: Standard errors, confidence intervals, and hypothesis tests are well-defined
- Computational speed: Training and prediction are orders of magnitude faster than deep learning
- Regularization theory: L1/L2 penalties have clear Bayesian interpretations and proven convergence
- Baseline performance: Linear models often outperform complex models on noisy financial data
In crypto markets where signal-to-noise ratios are extremely low, the bias-variance tradeoff favors simpler models. A Ridge regression with 20 features often outperforms a neural network with the same features, because the network overfits to noise while Ridge shrinks coefficients toward zero.
Factor Models for Digital Assets
The Capital Asset Pricing Model (CAPM) provides the simplest factor model:
R_i - R_f = alpha_i + beta_i * (R_market - R_f) + epsilon_iFor crypto, “market” is typically BTC or a market-cap-weighted crypto index. The beta coefficient measures systematic risk exposure, while alpha captures excess return not explained by market movements.
A multi-factor crypto model extends this:
R_i = alpha + beta_mkt * MKT + beta_size * SIZE + beta_mom * MOM + beta_chain * CHAIN + epsilonWhere SIZE is a small-minus-big factor, MOM is momentum, and CHAIN is an on-chain activity factor.
The Gauss-Markov Theorem and Its Limitations
The Gauss-Markov theorem states that under classical assumptions (linearity, exogeneity, homoskedasticity, no serial correlation, no perfect multicollinearity), OLS is the Best Linear Unbiased Estimator (BLUE). In crypto markets, nearly all these assumptions are violated:
- Heteroskedasticity: Crypto volatility clusters (GARCH effects)
- Serial correlation: Features based on overlapping windows
- Non-normality: Extreme kurtosis in return distributions
- Multicollinearity: Many crypto features are highly correlated
These violations do not make OLS useless but require robust standard errors (HAC estimators) and regularization.
Section 2: Mathematical Foundation
Ordinary Least Squares (OLS)
Given the linear model y = X * beta + epsilon, the OLS estimator minimizes the sum of squared residuals:
beta_hat = argmin ||y - X * beta||^2 = (X^T * X)^{-1} * X^T * y
Variance: Var(beta_hat) = sigma^2 * (X^T * X)^{-1} where sigma^2 = ||y - X * beta_hat||^2 / (n - p)Ridge Regression (L2 Regularization)
Ridge adds an L2 penalty to prevent coefficient explosion:
beta_ridge = argmin { ||y - X * beta||^2 + lambda * ||beta||^2 } = (X^T * X + lambda * I)^{-1} * X^T * y
Properties:- Shrinks all coefficients toward zero (never exactly zero)- Handles multicollinearity by stabilizing (X^T X + lambda I)- Bayesian interpretation: Gaussian prior on coefficients- Effective degrees of freedom: df(lambda) = tr(X(X^T X + lambda I)^{-1} X^T)Lasso Regression (L1 Regularization)
Lasso uses an L1 penalty that induces sparsity:
beta_lasso = argmin { ||y - X * beta||^2 + lambda * ||beta||_1 }
Properties:- Can shrink coefficients exactly to zero (feature selection)- Selects at most n features when p > n- Bayesian interpretation: Laplace prior on coefficients- No closed-form solution; requires coordinate descent or LARSElastic Net
Elastic Net combines L1 and L2 penalties:
beta_enet = argmin { ||y - X * beta||^2 + lambda_1 * ||beta||_1 + lambda_2 * ||beta||^2 }
Mixing parameter: alpha = lambda_1 / (lambda_1 + lambda_2)- alpha = 1: pure Lasso- alpha = 0: pure Ridge- 0 < alpha < 1: hybridLogistic Regression for Direction Prediction
For binary classification (up/down):
P(y=1|x) = sigma(x^T * beta) = 1 / (1 + exp(-x^T * beta))
Loss function: L = -sum[ y_i * log(p_i) + (1-y_i) * log(1-p_i) ] + lambda * penalty(beta) (Ridge, Lasso, or Elastic Net)Fama-Macbeth Regression
Two-pass regression for estimating risk premia:
Pass 1 (Time Series): For each asset i, estimate factor loadings: R_it = alpha_i + beta_i^T * F_t + epsilon_it
Pass 2 (Cross-Section): At each time t, estimate risk premia: R_it = gamma_0t + gamma_t^T * beta_hat_i + eta_it
Risk premia estimates: gamma_bar = (1/T) * sum_t gamma_t t-stat = gamma_bar / (se(gamma_t) / sqrt(T))Rolling Regression for Regime Detection
For each window [t-w, t]: beta_t = (X_window^T * X_window)^{-1} * X_window^T * y_window
Monitor:- beta stability: large changes signal regime shifts- R^2 evolution: declining R^2 means model is losing explanatory power- Residual patterns: autocorrelation in residuals indicates model misspecificationSection 3: Comparison of Linear Methods
| Method | Sparsity | Multicollinearity Handling | Interpretability | Feature Selection | Computational Cost |
|---|---|---|---|---|---|
| OLS | No | Poor | High | No | Very Low |
| Ridge (L2) | No | Excellent | High | No | Low |
| Lasso (L1) | Yes | Moderate | High | Yes | Low |
| Elastic Net | Yes | Good | High | Yes | Low |
| Logistic Regression | Optional | Good (with regularization) | High | With L1 | Low |
| Rolling OLS | No | Poor | High | No | Medium |
| Fama-Macbeth | No | Moderate | High | No | Medium |
| Crypto Factor | Description | Typical Loading (BTC) | Typical Loading (ALT) | Significance |
|---|---|---|---|---|
| Market (MKT) | BTC excess return | 1.00 (by def.) | 0.8 - 1.5 | Very High |
| Size (SMB) | Small minus big cap | -0.05 | 0.3 - 0.8 | Moderate |
| Momentum (MOM) | Winner minus loser | 0.02 | 0.1 - 0.4 | Moderate |
| On-Chain (CHAIN) | Active addresses factor | 0.15 | 0.2 - 0.6 | Low-Moderate |
| Volatility (VOL) | Low minus high vol | -0.10 | -0.2 - 0.3 | Low |
| Funding (FUND) | Funding rate factor | -0.08 | -0.1 - 0.2 | Low |
Section 4: Trading Applications
4.1 Crypto Factor Model Construction
Building a multi-factor model for the crypto cross-section:
- Market factor: BTC excess return over risk-free rate
- Size factor: Return difference between small-cap and large-cap tokens (sorted by market cap)
- Momentum factor: Return difference between recent winners and losers (30-day returns)
- On-chain factor: Return difference between high and low on-chain activity tokens
- Funding factor: Weighted average funding rate differential
This crypto factor model enables risk decomposition: “How much of SOL’s return is due to market beta vs. its own momentum vs. on-chain growth?“
4.2 Cross-Sectional Regression for Risk Premia
At each time period, regress asset returns on their estimated factor loadings to obtain risk premia. For a universe of 50 altcoins:
- Estimate betas for each altcoin using 90-day rolling windows
- Run monthly cross-sectional regressions
- Average the resulting gamma coefficients to estimate risk premia
- Test significance using Newey-West adjusted standard errors
4.3 Lasso for Crypto Feature Selection
With dozens of potential predictors (technical indicators, on-chain metrics, order flow features), Lasso selects the most relevant:
- Start with 50+ candidate features
- Lasso path: vary lambda from high (all zeros) to low (all features)
- Cross-validate to find optimal lambda
- Selected features typically include: funding rate, volume imbalance, BTC correlation, and volatility regime
4.4 Logistic Regression for Direction Prediction
Binary prediction of next-period return direction using regularized logistic regression:
- Features: momentum, volatility ratio, volume profile, funding rate
- Regularization prevents overfitting to noise
- Predicted probabilities can be used for position sizing
- Calibrated probabilities improve Kelly criterion sizing
4.5 Rolling Regression for Regime Detection
Track factor loadings over time to detect regime changes:
- 30-day rolling window for beta estimation
- Monitor beta_market: rising beta indicates increasing systematic risk
- Monitor R^2: declining R^2 suggests structural break
- Monitor alpha: persistent positive alpha may indicate mispricing
Section 5: Implementation in Python
Crypto Factor Model
import numpy as npimport pandas as pdfrom sklearn.linear_model import ( LinearRegression, Ridge, Lasso, ElasticNet, LogisticRegression)from sklearn.preprocessing import StandardScalerfrom sklearn.metrics import accuracy_score, r2_scoreimport requestsimport yfinance as yffrom typing import Dict, List, Tuple, Optional
class CryptoFactorModel: """Multi-factor model for crypto asset returns."""
def __init__(self, universe: List[str], market_symbol: str = "BTCUSDT"): self.universe = universe self.market_symbol = market_symbol self.returns = None self.factors = None self.betas = None
def fetch_bybit_returns(self, symbol: str, interval: str = "D", limit: int = 200) -> pd.Series: """Fetch daily returns from Bybit.""" url = "https://api.bybit.com/v5/market/kline" params = { "category": "linear", "symbol": symbol, "interval": interval, "limit": limit } response = requests.get(url, params=params) data = response.json()["result"]["list"] df = pd.DataFrame(data, columns=[ "timestamp", "open", "high", "low", "close", "volume", "turnover" ]) df["close"] = df["close"].astype(float) df["timestamp"] = pd.to_datetime(df["timestamp"].astype(int), unit="ms") df = df.sort_values("timestamp").set_index("timestamp") return df["close"].pct_change().dropna()
def construct_factors(self) -> pd.DataFrame: """Construct crypto factor returns.""" all_returns = {} for sym in self.universe + [self.market_symbol]: all_returns[sym] = self.fetch_bybit_returns(sym) self.returns = pd.DataFrame(all_returns).dropna()
factors = pd.DataFrame(index=self.returns.index)
# Market factor: BTC excess return factors["MKT"] = self.returns[self.market_symbol]
# Size factor: proxy using volatility (lower vol ~ larger cap) vols = self.returns[self.universe].rolling(30).std() median_vol = vols.median(axis=1) small = self.returns[self.universe].where(vols > median_vol.values[:, None]) big = self.returns[self.universe].where(vols <= median_vol.values[:, None]) factors["SIZE"] = small.mean(axis=1) - big.mean(axis=1)
# Momentum factor: 30-day momentum mom_30d = self.returns[self.universe].rolling(30).sum() median_mom = mom_30d.median(axis=1) winners = self.returns[self.universe].where( mom_30d > median_mom.values[:, None]) losers = self.returns[self.universe].where( mom_30d <= median_mom.values[:, None]) factors["MOM"] = winners.mean(axis=1) - losers.mean(axis=1)
self.factors = factors.dropna() return self.factors
def estimate_betas(self, window: int = 90) -> Dict[str, pd.DataFrame]: """Estimate rolling factor loadings for each asset.""" self.betas = {} common_idx = self.returns.index.intersection(self.factors.index) factors_aligned = self.factors.loc[common_idx]
for sym in self.universe: returns_aligned = self.returns[sym].loc[common_idx] betas_list = []
for i in range(window, len(common_idx)): y = returns_aligned.iloc[i - window:i].values X = factors_aligned.iloc[i - window:i].values
model = LinearRegression() model.fit(X, y)
betas_list.append({ "timestamp": common_idx[i], "alpha": model.intercept_, **{f"beta_{col}": coef for col, coef in zip(self.factors.columns, model.coef_)}, "r_squared": model.score(X, y) })
self.betas[sym] = pd.DataFrame(betas_list).set_index("timestamp")
return self.betas
def fama_macbeth(self) -> pd.DataFrame: """Fama-Macbeth cross-sectional regression.""" common_idx = self.returns.index.intersection(self.factors.index) if self.betas is None: self.estimate_betas()
# Collect beta estimates at each time point gammas = [] beta_cols = [c for c in list(self.betas.values())[0].columns if c.startswith("beta_")]
for t in common_idx: cross_section_returns = [] cross_section_betas = []
for sym in self.universe: if t in self.betas[sym].index and t in self.returns.index: cross_section_returns.append(self.returns[sym].loc[t]) cross_section_betas.append( self.betas[sym].loc[t][beta_cols].values)
if len(cross_section_returns) < 3: continue
y = np.array(cross_section_returns) X = np.array(cross_section_betas) X = np.column_stack([np.ones(len(y)), X])
try: model = LinearRegression(fit_intercept=False) model.fit(X, y) gammas.append({ "timestamp": t, "gamma_0": model.coef_[0], **{f"gamma_{col.replace('beta_', '')}": coef for col, coef in zip(beta_cols, model.coef_[1:])} }) except Exception: continue
result = pd.DataFrame(gammas).set_index("timestamp") # Compute risk premia and t-statistics summary = pd.DataFrame({ "mean": result.mean(), "std": result.std(), "t_stat": result.mean() / (result.std() / np.sqrt(len(result))), "annualized": result.mean() * 365 }) return summary
class RegularizedCryptoRegression: """Ridge, Lasso, and Elastic Net for crypto prediction."""
def __init__(self): self.scaler = StandardScaler() self.model = None
def fit_ridge(self, X: pd.DataFrame, y: pd.Series, alpha: float = 1.0) -> 'RegularizedCryptoRegression': X_scaled = self.scaler.fit_transform(X) self.model = Ridge(alpha=alpha) self.model.fit(X_scaled, y) return self
def fit_lasso(self, X: pd.DataFrame, y: pd.Series, alpha: float = 0.01) -> 'RegularizedCryptoRegression': X_scaled = self.scaler.fit_transform(X) self.model = Lasso(alpha=alpha, max_iter=10000) self.model.fit(X_scaled, y) return self
def fit_elastic_net(self, X: pd.DataFrame, y: pd.Series, alpha: float = 0.01, l1_ratio: float = 0.5) -> 'RegularizedCryptoRegression': X_scaled = self.scaler.fit_transform(X) self.model = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, max_iter=10000) self.model.fit(X_scaled, y) return self
def predict(self, X: pd.DataFrame) -> np.ndarray: X_scaled = self.scaler.transform(X) return self.model.predict(X_scaled)
def feature_importance(self, feature_names: List[str]) -> pd.Series: return pd.Series( np.abs(self.model.coef_), index=feature_names ).sort_values(ascending=False)
def selected_features(self, feature_names: List[str]) -> List[str]: """For Lasso/ElasticNet: return non-zero features.""" mask = np.abs(self.model.coef_) > 1e-10 return [f for f, m in zip(feature_names, mask) if m]
class CryptoLogisticModel: """Logistic regression for crypto direction prediction."""
def __init__(self, penalty: str = "l1", C: float = 1.0): self.scaler = StandardScaler() self.model = LogisticRegression( penalty=penalty, C=C, solver="saga", max_iter=5000 )
def fit(self, X: pd.DataFrame, y: pd.Series) -> 'CryptoLogisticModel': X_scaled = self.scaler.fit_transform(X) self.model.fit(X_scaled, y) return self
def predict_proba(self, X: pd.DataFrame) -> np.ndarray: X_scaled = self.scaler.transform(X) return self.model.predict_proba(X_scaled)[:, 1]
def predict(self, X: pd.DataFrame) -> np.ndarray: X_scaled = self.scaler.transform(X) return self.model.predict(X_scaled)
def coefficient_summary(self, feature_names: List[str]) -> pd.DataFrame: return pd.DataFrame({ "feature": feature_names, "coefficient": self.model.coef_[0], "abs_coefficient": np.abs(self.model.coef_[0]), "odds_ratio": np.exp(self.model.coef_[0]) }).sort_values("abs_coefficient", ascending=False)Usage Example
# Build crypto factor modelfactor_model = CryptoFactorModel( universe=["ETHUSDT", "SOLUSDT", "AVAXUSDT", "LINKUSDT", "DOTUSDT", "MATICUSDT", "AAVEUSDT", "UNIUSDT"], market_symbol="BTCUSDT")factor_model.construct_factors()betas = factor_model.estimate_betas(window=60)
# Fama-Macbeth risk premiarisk_premia = factor_model.fama_macbeth()print("Risk Premia Estimates:")print(risk_premia)
# Lasso feature selectionregressor = RegularizedCryptoRegression()# ... (with features and targets prepared)Section 6: Implementation in Rust
Project Structure
ch07_linear_methods_crypto/├── Cargo.toml├── src/│ ├── lib.rs│ ├── regression/│ │ ├── mod.rs│ │ ├── ols.rs│ │ └── regularized.rs│ ├── classification/│ │ ├── mod.rs│ │ └── logistic.rs│ └── factor/│ ├── mod.rs│ └── model.rs└── examples/ ├── crypto_factor_model.rs ├── lasso_selection.rs └── rolling_regression.rsCore Library (src/lib.rs)
pub mod regression;pub mod classification;pub mod factor;
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone)]pub struct RegressionResult { pub coefficients: Vec<f64>, pub intercept: f64, pub r_squared: f64, pub residuals: Vec<f64>, pub feature_names: Vec<String>,}
impl RegressionResult { pub fn display(&self) { println!("Regression Results (R² = {:.4}):", self.r_squared); println!(" Intercept: {:.6}", self.intercept); for (name, coef) in self.feature_names.iter().zip(self.coefficients.iter()) { println!(" {}: {:.6}", name, coef); } }}
#[derive(Debug, Clone, Serialize, Deserialize)]pub struct FactorLoading { pub timestamp: i64, pub alpha: f64, pub betas: Vec<f64>, pub r_squared: f64,}OLS Regression (src/regression/ols.rs)
use crate::RegressionResult;
pub struct OLSRegression;
impl OLSRegression { /// Solve OLS: beta = (X^T X)^{-1} X^T y pub fn fit( x: &[Vec<f64>], y: &[f64], feature_names: &[String], fit_intercept: bool, ) -> RegressionResult { let n = y.len(); let p = x[0].len(); let p_total = if fit_intercept { p + 1 } else { p };
// Build design matrix let mut xtx = vec![vec![0.0; p_total]; p_total]; let mut xty = vec![0.0; p_total];
for i in 0..n { let row = Self::build_row(&x[i], fit_intercept); for j in 0..p_total { xty[j] += row[j] * y[i]; for k in 0..=j { xtx[j][k] += row[j] * row[k]; if j != k { xtx[k][j] = xtx[j][k]; } } } }
// Solve using Cholesky-like decomposition (simplified) let beta = Self::solve_linear_system(&xtx, &xty);
// Compute predictions and R² let mut ss_res = 0.0; let mut ss_tot = 0.0; let y_mean = y.iter().sum::<f64>() / n as f64; let mut residuals = Vec::with_capacity(n);
for i in 0..n { let row = Self::build_row(&x[i], fit_intercept); let y_pred: f64 = row.iter().zip(beta.iter()).map(|(a, b)| a * b).sum(); let res = y[i] - y_pred; residuals.push(res); ss_res += res * res; ss_tot += (y[i] - y_mean) * (y[i] - y_mean); }
let r_squared = if ss_tot > 0.0 { 1.0 - ss_res / ss_tot } else { 0.0 };
let (intercept, coefficients) = if fit_intercept { (beta[0], beta[1..].to_vec()) } else { (0.0, beta) };
RegressionResult { coefficients, intercept, r_squared, residuals, feature_names: feature_names.to_vec(), } }
fn build_row(x: &[f64], fit_intercept: bool) -> Vec<f64> { if fit_intercept { let mut row = vec![1.0]; row.extend_from_slice(x); row } else { x.to_vec() } }
fn solve_linear_system(a: &[Vec<f64>], b: &[f64]) -> Vec<f64> { let n = b.len(); let mut aug = vec![vec![0.0; n + 1]; n]; for i in 0..n { for j in 0..n { aug[i][j] = a[i][j]; } aug[i][n] = b[i]; }
// Gaussian elimination with partial pivoting for i in 0..n { let mut max_row = i; for k in (i + 1)..n { if aug[k][i].abs() > aug[max_row][i].abs() { max_row = k; } } aug.swap(i, max_row);
let pivot = aug[i][i]; if pivot.abs() < 1e-12 { continue; }
for j in i..=n { aug[i][j] /= pivot; }
for k in 0..n { if k != i { let factor = aug[k][i]; for j in i..=n { aug[k][j] -= factor * aug[i][j]; } } } }
(0..n).map(|i| aug[i][n]).collect() }
/// Rolling OLS regression pub fn rolling( x: &[Vec<f64>], y: &[f64], feature_names: &[String], window: usize, ) -> Vec<RegressionResult> { let n = y.len(); let mut results = Vec::new();
for i in window..n { let x_window: Vec<Vec<f64>> = x[i - window..i].to_vec(); let y_window: Vec<f64> = y[i - window..i].to_vec(); let result = Self::fit(&x_window, &y_window, feature_names, true); results.push(result); }
results }}Regularized Regression (src/regression/regularized.rs)
pub struct RidgeRegression { pub alpha: f64, pub coefficients: Vec<f64>, pub intercept: f64,}
impl RidgeRegression { pub fn new(alpha: f64) -> Self { Self { alpha, coefficients: Vec::new(), intercept: 0.0, } }
/// Fit Ridge: beta = (X^T X + alpha * I)^{-1} X^T y pub fn fit(&mut self, x: &[Vec<f64>], y: &[f64]) { let n = y.len(); let p = x[0].len();
// Center data let y_mean = y.iter().sum::<f64>() / n as f64; let x_means: Vec<f64> = (0..p).map(|j| { x.iter().map(|row| row[j]).sum::<f64>() / n as f64 }).collect();
// Build X^T X + alpha * I let mut xtx = vec![vec![0.0; p]; p]; let mut xty = vec![0.0; p];
for i in 0..n { for j in 0..p { let xj = x[i][j] - x_means[j]; xty[j] += xj * (y[i] - y_mean); for k in 0..=j { let xk = x[i][k] - x_means[k]; xtx[j][k] += xj * xk; if j != k { xtx[k][j] = xtx[j][k]; } } } }
// Add ridge penalty for j in 0..p { xtx[j][j] += self.alpha; }
// Solve self.coefficients = Self::solve(&xtx, &xty); self.intercept = y_mean - x_means.iter() .zip(self.coefficients.iter()) .map(|(m, c)| m * c) .sum::<f64>(); }
pub fn predict(&self, x: &[Vec<f64>]) -> Vec<f64> { x.iter().map(|row| { self.intercept + row.iter() .zip(self.coefficients.iter()) .map(|(xi, ci)| xi * ci) .sum::<f64>() }).collect() }
fn solve(a: &[Vec<f64>], b: &[f64]) -> Vec<f64> { let n = b.len(); let mut aug = vec![vec![0.0; n + 1]; n]; for i in 0..n { for j in 0..n { aug[i][j] = a[i][j]; } aug[i][n] = b[i]; }
for i in 0..n { let mut max_row = i; for k in (i + 1)..n { if aug[k][i].abs() > aug[max_row][i].abs() { max_row = k; } } aug.swap(i, max_row);
let pivot = aug[i][i]; if pivot.abs() < 1e-12 { continue; }
for j in i..=n { aug[i][j] /= pivot; }
for k in 0..n { if k != i { let factor = aug[k][i]; for j in i..=n { aug[k][j] -= factor * aug[i][j]; } } } }
(0..n).map(|i| aug[i][n]).collect() }}
pub struct LassoRegression { pub alpha: f64, pub coefficients: Vec<f64>, pub intercept: f64, pub max_iter: usize, pub tolerance: f64,}
impl LassoRegression { pub fn new(alpha: f64) -> Self { Self { alpha, coefficients: Vec::new(), intercept: 0.0, max_iter: 10000, tolerance: 1e-6, } }
/// Fit Lasso using coordinate descent pub fn fit(&mut self, x: &[Vec<f64>], y: &[f64]) { let n = y.len(); let p = x[0].len();
let y_mean = y.iter().sum::<f64>() / n as f64; self.coefficients = vec![0.0; p];
let mut residuals: Vec<f64> = y.iter().map(|yi| yi - y_mean).collect();
for _ in 0..self.max_iter { let mut max_change = 0.0_f64;
for j in 0..p { // Add back contribution of feature j for i in 0..n { residuals[i] += self.coefficients[j] * x[i][j]; }
// Compute partial residual correlation let rho: f64 = (0..n) .map(|i| x[i][j] * residuals[i]) .sum::<f64>() / n as f64;
let x_sq: f64 = (0..n) .map(|i| x[i][j] * x[i][j]) .sum::<f64>() / n as f64;
// Soft thresholding let new_coef = Self::soft_threshold(rho, self.alpha) / x_sq; max_change = max_change.max((new_coef - self.coefficients[j]).abs()); self.coefficients[j] = new_coef;
// Update residuals for i in 0..n { residuals[i] -= self.coefficients[j] * x[i][j]; } }
if max_change < self.tolerance { break; } }
self.intercept = y_mean; }
fn soft_threshold(rho: f64, lambda: f64) -> f64 { if rho > lambda { rho - lambda } else if rho < -lambda { rho + lambda } else { 0.0 } }
pub fn selected_features(&self) -> Vec<usize> { self.coefficients.iter() .enumerate() .filter(|(_, c)| c.abs() > 1e-10) .map(|(i, _)| i) .collect() }}Bybit Data Fetcher
use reqwest;use serde::Deserialize;use anyhow::Result;
#[derive(Deserialize)]struct BybitResponse { result: BybitResult,}
#[derive(Deserialize)]struct BybitResult { list: Vec<Vec<String>>,}
pub async fn fetch_bybit_returns( symbol: &str, interval: &str, limit: u32,) -> Result<Vec<f64>> { let client = reqwest::Client::new(); let resp = client .get("https://api.bybit.com/v5/market/kline") .query(&[ ("category", "linear"), ("symbol", symbol), ("interval", interval), ("limit", &limit.to_string()), ]) .send() .await? .json::<BybitResponse>() .await?;
let closes: Vec<f64> = resp.result.list .iter() .map(|row| row[4].parse::<f64>().unwrap_or(0.0)) .rev() .collect();
let returns: Vec<f64> = closes.windows(2) .map(|w| (w[1] - w[0]) / w[0]) .collect();
Ok(returns)}Section 7: Practical Examples
Example 1: Building a Crypto Factor Model
factor_model = CryptoFactorModel( universe=["ETHUSDT", "SOLUSDT", "AVAXUSDT", "LINKUSDT", "DOTUSDT", "MATICUSDT", "AAVEUSDT", "UNIUSDT"], market_symbol="BTCUSDT")factors = factor_model.construct_factors()betas = factor_model.estimate_betas(window=60)
print("Factor Loadings for ETHUSDT (latest):")print(betas["ETHUSDT"].tail(1).T)
# Expected output:# alpha 0.000234# beta_MKT 0.892341# beta_SIZE -0.045123# beta_MOM 0.123456# r_squared 0.723456
risk_premia = factor_model.fama_macbeth()print("\nFama-Macbeth Risk Premia:")print(risk_premia)
# Expected output:# mean std t_stat annualized# gamma_0 0.0003 0.012 0.354 0.1095# gamma_MKT 0.0008 0.008 1.414 0.2920# gamma_SIZE 0.0002 0.006 0.471 0.0730# gamma_MOM 0.0005 0.009 0.786 0.1825Example 2: Lasso Feature Selection for Return Prediction
# Create comprehensive feature setfeatures = pd.DataFrame(index=factor_model.returns.index)for sym in factor_model.universe[:4]: ret = factor_model.returns[sym] features[f"{sym}_ret1"] = ret features[f"{sym}_ret5"] = ret.rolling(5).sum() features[f"{sym}_vol"] = ret.rolling(20).std() features[f"{sym}_mom"] = ret.rolling(30).sum()features = features.dropna()
target = factor_model.returns["ETHUSDT"].loc[features.index].shift(-1).dropna()features = features.loc[target.index]
regressor = RegularizedCryptoRegression()regressor.fit_lasso(features, target, alpha=0.001)
selected = regressor.selected_features(features.columns.tolist())print(f"Lasso selected {len(selected)} / {len(features.columns)} features:")for feat in selected: coef = regressor.model.coef_[features.columns.tolist().index(feat)] print(f" {feat}: {coef:.6f}")
# Expected output:# Lasso selected 5 / 16 features:# ETHUSDT_ret1: 0.034521# SOLUSDT_ret1: 0.012345# ETHUSDT_vol: -0.087654# LINKUSDT_mom: 0.005432# SOLUSDT_vol: -0.023456Example 3: Rolling Regression Regime Detection
# Track BTC beta of ETH over timeeth_returns = factor_model.returns["ETHUSDT"]btc_returns = factor_model.returns["BTCUSDT"]common = eth_returns.index.intersection(btc_returns.index)
window = 30rolling_betas = []for i in range(window, len(common)): y = eth_returns.loc[common[i-window:i]].values X = btc_returns.loc[common[i-window:i]].values.reshape(-1, 1) model = LinearRegression() model.fit(X, y) rolling_betas.append({ "date": common[i], "beta": model.coef_[0], "alpha": model.intercept_, "r_squared": model.score(X, y) })
df_betas = pd.DataFrame(rolling_betas).set_index("date")print("Rolling Beta Statistics:")print(f" Mean beta: {df_betas['beta'].mean():.4f}")print(f" Std beta: {df_betas['beta'].std():.4f}")print(f" Min beta: {df_betas['beta'].min():.4f} (regime: divergence)")print(f" Max beta: {df_betas['beta'].max():.4f} (regime: high correlation)")print(f" Mean R²: {df_betas['r_squared'].mean():.4f}")
# Expected output:# Rolling Beta Statistics:# Mean beta: 0.9234# Std beta: 0.2156# Min beta: 0.4523 (regime: divergence)# Max beta: 1.4567 (regime: high correlation)# Mean R²: 0.6789Section 8: Backtesting Framework
Framework Components
The linear methods backtesting framework includes:
- Factor Data Pipeline: Constructs factor returns from Bybit/yfinance data
- Rolling Regression Engine: Estimates time-varying factor loadings
- Signal Generator: Translates alpha estimates and factor views into signals
- Position Sizer: Uses coefficient significance for position scaling
- Performance Tracker: Computes strategy-level and factor-level attribution
Metrics Dashboard
| Metric | Description | Computation |
|---|---|---|
| Factor R² | Explanatory power of factors | 1 - SS_res / SS_tot |
| Alpha (annualized) | Excess return beyond factors | intercept * 365 |
| Alpha t-stat | Statistical significance of alpha | alpha / se(alpha) |
| Beta stability | Coefficient of variation of rolling beta | std(beta) / mean(beta) |
| Information Coefficient | Correlation of predictions with outcomes | corr(y_hat, y) |
| Factor Sharpe | Risk-adjusted factor return | mean(F) / std(F) * sqrt(365) |
| Lasso sparsity | Fraction of zero coefficients | count( |
Sample Results
=== Linear Methods Backtest: Crypto Factor Model ===
Period: 2024-01-01 to 2024-12-31Universe: 8 altcoins | Benchmark: BTCUSDTFactor Model: MKT + SIZE + MOM (3 factors)
Factor Performance: Factor | Ann.Return | Sharpe | Significance (t-stat) --------|------------|--------|--------------------- MKT | 62.3% | 1.45 | 3.21 *** SIZE | 8.7% | 0.42 | 1.12 MOM | 15.2% | 0.78 | 1.89 *
Prediction Model (Ridge, alpha=1.0): In-sample R²: 0.0423 Out-of-sample R²: 0.0187 Information Coeff: 0.137 Direction accuracy: 0.534
Lasso Feature Selection (alpha=0.001): Features selected: 5 / 16 OOS R² (selected): 0.0201 OOS R² (all feat): 0.0145
Rolling Regression (30-day window): Mean BTC beta (ETH): 0.923 +/- 0.216 Beta range: [0.452, 1.457] R² range: [0.312, 0.891] Regime shifts detected: 4Section 9: Performance Evaluation
Comparison of Linear Methods on Crypto Data
| Method | In-Sample R² | OOS R² | Direction Acc. | Sparsity | Stability |
|---|---|---|---|---|---|
| OLS (all features) | 0.052 | 0.008 | 0.512 | 0% | Low |
| OLS (5 features) | 0.031 | 0.018 | 0.528 | 69% | Medium |
| Ridge (CV alpha) | 0.048 | 0.021 | 0.531 | 0% | High |
| Lasso (CV alpha) | 0.035 | 0.019 | 0.529 | 65% | Medium |
| Elastic Net (0.5) | 0.041 | 0.020 | 0.530 | 50% | High |
| Logistic (L1) | N/A | N/A | 0.534 | 60% | Medium |
| Rolling OLS (30d) | 0.067 | 0.015 | 0.523 | 0% | Low |
Key Findings
-
Ridge regression consistently achieves the best out-of-sample R² on crypto data, because the L2 penalty stabilizes coefficient estimates without forcing them to zero, which is important when many features carry small but non-zero signal.
-
Lasso’s feature selection is valuable but can be unstable: the set of selected features changes significantly across time periods, suggesting that feature importance in crypto is regime-dependent.
-
Factor models explain 30-70% of altcoin variance through BTC beta alone. Adding size and momentum factors improves explanatory power by 5-10%, but on-chain factors remain weak due to noisy data.
-
Rolling regression reveals clear regime shifts: BTC-ETH beta ranges from 0.45 during divergence periods to 1.45 during high-correlation crashes. Monitoring beta trends provides early warning of regime changes.
-
Direction accuracy above 53% is achievable with regularized logistic regression on carefully selected features, translating to positive expected trading PnL after Bybit fees.
Limitations
- Linear models cannot capture interaction effects or non-linear factor relationships
- Factor construction depends on asset universe selection, introducing survivorship bias
- Fama-Macbeth standard errors may be understated due to cross-sectional dependence
- Rolling regression estimates are lagged and may miss rapid regime transitions
- OLS coefficient estimates are biased when features contain measurement error (errors-in-variables problem)
Section 10: Future Directions
-
Non-Linear Factor Models: Extending linear factor models with kernel methods or polynomial features to capture non-linear relationships between factors and returns, while maintaining the interpretability advantages of the factor model framework.
-
High-Frequency Factor Models: Adapting factor models to intraday frequencies (1-minute, tick-level) using Bybit order book data, capturing microstructure factors like order flow imbalance and queue position.
-
Dynamic Factor Loading Models: Implementing state-space models (Kalman filter) for continuous tracking of factor loadings, replacing the discontinuous rolling window approach with smooth, real-time estimates.
-
On-Chain Factor Innovation: Developing novel crypto-specific factors from blockchain data, including MEV (Maximal Extractable Value), validator behavior, and cross-chain bridge flows.
-
Bayesian Linear Regression: Placing informative priors on factor loadings based on economic theory (e.g., BTC beta should be positive for most altcoins), improving estimation in small samples.
-
Instrumental Variables for Crypto: Addressing endogeneity in crypto factor models using instrumental variables, such as using mining difficulty as an instrument for BTC supply shocks.
References
-
Fama, E. F., & French, K. R. (1993). “Common Risk Factors in the Returns on Stocks and Bonds.” Journal of Financial Economics, 33(1), 3-56.
-
Tibshirani, R. (1996). “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society Series B, 58(1), 267-288.
-
Hoerl, A. E., & Kennard, R. W. (1970). “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics, 12(1), 55-67.
-
Zou, H., & Hastie, T. (2005). “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society Series B, 67(2), 301-320.
-
Fama, E. F., & MacBeth, J. D. (1973). “Risk, Return, and Equilibrium: Empirical Tests.” Journal of Political Economy, 81(3), 607-636.
-
Liu, Y., Tsyvinski, A., & Wu, X. (2022). “Common Risk Factors in Cryptocurrency.” The Journal of Finance, 77(2), 1133-1177.
-
Newey, W. K., & West, K. D. (1987). “A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.” Econometrica, 55(3), 703-708.