Skip to content

Chapter 7: Linear Methods for Crypto Return Prediction and Risk Decomposition

Chapter 7: Linear Methods for Crypto Return Prediction and Risk Decomposition

Overview

Linear models remain among the most powerful tools for financial analysis despite the rise of complex machine learning methods. Their interpretability, computational efficiency, and well-understood statistical properties make them indispensable for crypto return prediction and risk decomposition. Ordinary Least Squares (OLS) regression provides the foundation for factor models that decompose crypto asset returns into systematic risk exposures and idiosyncratic components, enabling portfolio managers to understand what drives returns and to hedge unwanted exposures.

The application of linear methods to cryptocurrency markets requires addressing several challenges unique to digital assets. Crypto factor models must incorporate novel factors beyond the traditional market, size, and momentum: on-chain activity (active addresses, transaction volume), network effects, and tokenomics-specific metrics. Cross-sectional regression across the altcoin universe enables estimation of risk premia for these factors, extending the Fama-Macbeth methodology to the crypto domain. However, the high dimensionality and multicollinearity of crypto features demands regularization through Ridge (L2), Lasso (L1), and Elastic Net approaches.

This chapter covers the complete spectrum of linear methods adapted for crypto trading: from OLS regression on crypto factors through regularized methods for feature selection, to logistic regression for binary direction prediction. Rolling regression analysis reveals how factor loadings shift across market regimes, providing early warning signals for correlation breakdowns. Both Python and Rust implementations are provided, with practical examples using Bybit market data and yfinance for supplementary data sources.

Table of Contents

  1. Introduction to Linear Methods in Crypto
  2. Mathematical Foundation
  3. Comparison of Linear Methods
  4. Trading Applications
  5. Implementation in Python
  6. Implementation in Rust
  7. Practical Examples
  8. Backtesting Framework
  9. Performance Evaluation
  10. Future Directions

Section 1: Introduction to Linear Methods in Crypto

Why Linear Models for Crypto?

Despite the non-linear dynamics of cryptocurrency markets, linear models offer critical advantages:

  1. Interpretability: Coefficients directly represent factor exposures and marginal effects
  2. Statistical inference: Standard errors, confidence intervals, and hypothesis tests are well-defined
  3. Computational speed: Training and prediction are orders of magnitude faster than deep learning
  4. Regularization theory: L1/L2 penalties have clear Bayesian interpretations and proven convergence
  5. Baseline performance: Linear models often outperform complex models on noisy financial data

In crypto markets where signal-to-noise ratios are extremely low, the bias-variance tradeoff favors simpler models. A Ridge regression with 20 features often outperforms a neural network with the same features, because the network overfits to noise while Ridge shrinks coefficients toward zero.

Factor Models for Digital Assets

The Capital Asset Pricing Model (CAPM) provides the simplest factor model:

R_i - R_f = alpha_i + beta_i * (R_market - R_f) + epsilon_i

For crypto, “market” is typically BTC or a market-cap-weighted crypto index. The beta coefficient measures systematic risk exposure, while alpha captures excess return not explained by market movements.

A multi-factor crypto model extends this:

R_i = alpha + beta_mkt * MKT + beta_size * SIZE + beta_mom * MOM + beta_chain * CHAIN + epsilon

Where SIZE is a small-minus-big factor, MOM is momentum, and CHAIN is an on-chain activity factor.

The Gauss-Markov Theorem and Its Limitations

The Gauss-Markov theorem states that under classical assumptions (linearity, exogeneity, homoskedasticity, no serial correlation, no perfect multicollinearity), OLS is the Best Linear Unbiased Estimator (BLUE). In crypto markets, nearly all these assumptions are violated:

  • Heteroskedasticity: Crypto volatility clusters (GARCH effects)
  • Serial correlation: Features based on overlapping windows
  • Non-normality: Extreme kurtosis in return distributions
  • Multicollinearity: Many crypto features are highly correlated

These violations do not make OLS useless but require robust standard errors (HAC estimators) and regularization.


Section 2: Mathematical Foundation

Ordinary Least Squares (OLS)

Given the linear model y = X * beta + epsilon, the OLS estimator minimizes the sum of squared residuals:

beta_hat = argmin ||y - X * beta||^2
= (X^T * X)^{-1} * X^T * y
Variance: Var(beta_hat) = sigma^2 * (X^T * X)^{-1}
where sigma^2 = ||y - X * beta_hat||^2 / (n - p)

Ridge Regression (L2 Regularization)

Ridge adds an L2 penalty to prevent coefficient explosion:

beta_ridge = argmin { ||y - X * beta||^2 + lambda * ||beta||^2 }
= (X^T * X + lambda * I)^{-1} * X^T * y
Properties:
- Shrinks all coefficients toward zero (never exactly zero)
- Handles multicollinearity by stabilizing (X^T X + lambda I)
- Bayesian interpretation: Gaussian prior on coefficients
- Effective degrees of freedom: df(lambda) = tr(X(X^T X + lambda I)^{-1} X^T)

Lasso Regression (L1 Regularization)

Lasso uses an L1 penalty that induces sparsity:

beta_lasso = argmin { ||y - X * beta||^2 + lambda * ||beta||_1 }
Properties:
- Can shrink coefficients exactly to zero (feature selection)
- Selects at most n features when p > n
- Bayesian interpretation: Laplace prior on coefficients
- No closed-form solution; requires coordinate descent or LARS

Elastic Net

Elastic Net combines L1 and L2 penalties:

beta_enet = argmin { ||y - X * beta||^2 + lambda_1 * ||beta||_1 + lambda_2 * ||beta||^2 }
Mixing parameter: alpha = lambda_1 / (lambda_1 + lambda_2)
- alpha = 1: pure Lasso
- alpha = 0: pure Ridge
- 0 < alpha < 1: hybrid

Logistic Regression for Direction Prediction

For binary classification (up/down):

P(y=1|x) = sigma(x^T * beta) = 1 / (1 + exp(-x^T * beta))
Loss function: L = -sum[ y_i * log(p_i) + (1-y_i) * log(1-p_i) ]
+ lambda * penalty(beta) (Ridge, Lasso, or Elastic Net)

Fama-Macbeth Regression

Two-pass regression for estimating risk premia:

Pass 1 (Time Series): For each asset i, estimate factor loadings:
R_it = alpha_i + beta_i^T * F_t + epsilon_it
Pass 2 (Cross-Section): At each time t, estimate risk premia:
R_it = gamma_0t + gamma_t^T * beta_hat_i + eta_it
Risk premia estimates:
gamma_bar = (1/T) * sum_t gamma_t
t-stat = gamma_bar / (se(gamma_t) / sqrt(T))

Rolling Regression for Regime Detection

For each window [t-w, t]:
beta_t = (X_window^T * X_window)^{-1} * X_window^T * y_window
Monitor:
- beta stability: large changes signal regime shifts
- R^2 evolution: declining R^2 means model is losing explanatory power
- Residual patterns: autocorrelation in residuals indicates model misspecification

Section 3: Comparison of Linear Methods

MethodSparsityMulticollinearity HandlingInterpretabilityFeature SelectionComputational Cost
OLSNoPoorHighNoVery Low
Ridge (L2)NoExcellentHighNoLow
Lasso (L1)YesModerateHighYesLow
Elastic NetYesGoodHighYesLow
Logistic RegressionOptionalGood (with regularization)HighWith L1Low
Rolling OLSNoPoorHighNoMedium
Fama-MacbethNoModerateHighNoMedium
Crypto FactorDescriptionTypical Loading (BTC)Typical Loading (ALT)Significance
Market (MKT)BTC excess return1.00 (by def.)0.8 - 1.5Very High
Size (SMB)Small minus big cap-0.050.3 - 0.8Moderate
Momentum (MOM)Winner minus loser0.020.1 - 0.4Moderate
On-Chain (CHAIN)Active addresses factor0.150.2 - 0.6Low-Moderate
Volatility (VOL)Low minus high vol-0.10-0.2 - 0.3Low
Funding (FUND)Funding rate factor-0.08-0.1 - 0.2Low

Section 4: Trading Applications

4.1 Crypto Factor Model Construction

Building a multi-factor model for the crypto cross-section:

  1. Market factor: BTC excess return over risk-free rate
  2. Size factor: Return difference between small-cap and large-cap tokens (sorted by market cap)
  3. Momentum factor: Return difference between recent winners and losers (30-day returns)
  4. On-chain factor: Return difference between high and low on-chain activity tokens
  5. Funding factor: Weighted average funding rate differential

This crypto factor model enables risk decomposition: “How much of SOL’s return is due to market beta vs. its own momentum vs. on-chain growth?“

4.2 Cross-Sectional Regression for Risk Premia

At each time period, regress asset returns on their estimated factor loadings to obtain risk premia. For a universe of 50 altcoins:

  • Estimate betas for each altcoin using 90-day rolling windows
  • Run monthly cross-sectional regressions
  • Average the resulting gamma coefficients to estimate risk premia
  • Test significance using Newey-West adjusted standard errors

4.3 Lasso for Crypto Feature Selection

With dozens of potential predictors (technical indicators, on-chain metrics, order flow features), Lasso selects the most relevant:

  • Start with 50+ candidate features
  • Lasso path: vary lambda from high (all zeros) to low (all features)
  • Cross-validate to find optimal lambda
  • Selected features typically include: funding rate, volume imbalance, BTC correlation, and volatility regime

4.4 Logistic Regression for Direction Prediction

Binary prediction of next-period return direction using regularized logistic regression:

  • Features: momentum, volatility ratio, volume profile, funding rate
  • Regularization prevents overfitting to noise
  • Predicted probabilities can be used for position sizing
  • Calibrated probabilities improve Kelly criterion sizing

4.5 Rolling Regression for Regime Detection

Track factor loadings over time to detect regime changes:

  • 30-day rolling window for beta estimation
  • Monitor beta_market: rising beta indicates increasing systematic risk
  • Monitor R^2: declining R^2 suggests structural break
  • Monitor alpha: persistent positive alpha may indicate mispricing

Section 5: Implementation in Python

Crypto Factor Model

import numpy as np
import pandas as pd
from sklearn.linear_model import (
LinearRegression, Ridge, Lasso, ElasticNet, LogisticRegression
)
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, r2_score
import requests
import yfinance as yf
from typing import Dict, List, Tuple, Optional
class CryptoFactorModel:
"""Multi-factor model for crypto asset returns."""
def __init__(self, universe: List[str], market_symbol: str = "BTCUSDT"):
self.universe = universe
self.market_symbol = market_symbol
self.returns = None
self.factors = None
self.betas = None
def fetch_bybit_returns(self, symbol: str, interval: str = "D",
limit: int = 200) -> pd.Series:
"""Fetch daily returns from Bybit."""
url = "https://api.bybit.com/v5/market/kline"
params = {
"category": "linear",
"symbol": symbol,
"interval": interval,
"limit": limit
}
response = requests.get(url, params=params)
data = response.json()["result"]["list"]
df = pd.DataFrame(data, columns=[
"timestamp", "open", "high", "low", "close", "volume", "turnover"
])
df["close"] = df["close"].astype(float)
df["timestamp"] = pd.to_datetime(df["timestamp"].astype(int), unit="ms")
df = df.sort_values("timestamp").set_index("timestamp")
return df["close"].pct_change().dropna()
def construct_factors(self) -> pd.DataFrame:
"""Construct crypto factor returns."""
all_returns = {}
for sym in self.universe + [self.market_symbol]:
all_returns[sym] = self.fetch_bybit_returns(sym)
self.returns = pd.DataFrame(all_returns).dropna()
factors = pd.DataFrame(index=self.returns.index)
# Market factor: BTC excess return
factors["MKT"] = self.returns[self.market_symbol]
# Size factor: proxy using volatility (lower vol ~ larger cap)
vols = self.returns[self.universe].rolling(30).std()
median_vol = vols.median(axis=1)
small = self.returns[self.universe].where(vols > median_vol.values[:, None])
big = self.returns[self.universe].where(vols <= median_vol.values[:, None])
factors["SIZE"] = small.mean(axis=1) - big.mean(axis=1)
# Momentum factor: 30-day momentum
mom_30d = self.returns[self.universe].rolling(30).sum()
median_mom = mom_30d.median(axis=1)
winners = self.returns[self.universe].where(
mom_30d > median_mom.values[:, None])
losers = self.returns[self.universe].where(
mom_30d <= median_mom.values[:, None])
factors["MOM"] = winners.mean(axis=1) - losers.mean(axis=1)
self.factors = factors.dropna()
return self.factors
def estimate_betas(self, window: int = 90) -> Dict[str, pd.DataFrame]:
"""Estimate rolling factor loadings for each asset."""
self.betas = {}
common_idx = self.returns.index.intersection(self.factors.index)
factors_aligned = self.factors.loc[common_idx]
for sym in self.universe:
returns_aligned = self.returns[sym].loc[common_idx]
betas_list = []
for i in range(window, len(common_idx)):
y = returns_aligned.iloc[i - window:i].values
X = factors_aligned.iloc[i - window:i].values
model = LinearRegression()
model.fit(X, y)
betas_list.append({
"timestamp": common_idx[i],
"alpha": model.intercept_,
**{f"beta_{col}": coef
for col, coef in zip(self.factors.columns, model.coef_)},
"r_squared": model.score(X, y)
})
self.betas[sym] = pd.DataFrame(betas_list).set_index("timestamp")
return self.betas
def fama_macbeth(self) -> pd.DataFrame:
"""Fama-Macbeth cross-sectional regression."""
common_idx = self.returns.index.intersection(self.factors.index)
if self.betas is None:
self.estimate_betas()
# Collect beta estimates at each time point
gammas = []
beta_cols = [c for c in list(self.betas.values())[0].columns
if c.startswith("beta_")]
for t in common_idx:
cross_section_returns = []
cross_section_betas = []
for sym in self.universe:
if t in self.betas[sym].index and t in self.returns.index:
cross_section_returns.append(self.returns[sym].loc[t])
cross_section_betas.append(
self.betas[sym].loc[t][beta_cols].values)
if len(cross_section_returns) < 3:
continue
y = np.array(cross_section_returns)
X = np.array(cross_section_betas)
X = np.column_stack([np.ones(len(y)), X])
try:
model = LinearRegression(fit_intercept=False)
model.fit(X, y)
gammas.append({
"timestamp": t,
"gamma_0": model.coef_[0],
**{f"gamma_{col.replace('beta_', '')}": coef
for col, coef in zip(beta_cols, model.coef_[1:])}
})
except Exception:
continue
result = pd.DataFrame(gammas).set_index("timestamp")
# Compute risk premia and t-statistics
summary = pd.DataFrame({
"mean": result.mean(),
"std": result.std(),
"t_stat": result.mean() / (result.std() / np.sqrt(len(result))),
"annualized": result.mean() * 365
})
return summary
class RegularizedCryptoRegression:
"""Ridge, Lasso, and Elastic Net for crypto prediction."""
def __init__(self):
self.scaler = StandardScaler()
self.model = None
def fit_ridge(self, X: pd.DataFrame, y: pd.Series,
alpha: float = 1.0) -> 'RegularizedCryptoRegression':
X_scaled = self.scaler.fit_transform(X)
self.model = Ridge(alpha=alpha)
self.model.fit(X_scaled, y)
return self
def fit_lasso(self, X: pd.DataFrame, y: pd.Series,
alpha: float = 0.01) -> 'RegularizedCryptoRegression':
X_scaled = self.scaler.fit_transform(X)
self.model = Lasso(alpha=alpha, max_iter=10000)
self.model.fit(X_scaled, y)
return self
def fit_elastic_net(self, X: pd.DataFrame, y: pd.Series,
alpha: float = 0.01,
l1_ratio: float = 0.5) -> 'RegularizedCryptoRegression':
X_scaled = self.scaler.fit_transform(X)
self.model = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, max_iter=10000)
self.model.fit(X_scaled, y)
return self
def predict(self, X: pd.DataFrame) -> np.ndarray:
X_scaled = self.scaler.transform(X)
return self.model.predict(X_scaled)
def feature_importance(self, feature_names: List[str]) -> pd.Series:
return pd.Series(
np.abs(self.model.coef_),
index=feature_names
).sort_values(ascending=False)
def selected_features(self, feature_names: List[str]) -> List[str]:
"""For Lasso/ElasticNet: return non-zero features."""
mask = np.abs(self.model.coef_) > 1e-10
return [f for f, m in zip(feature_names, mask) if m]
class CryptoLogisticModel:
"""Logistic regression for crypto direction prediction."""
def __init__(self, penalty: str = "l1", C: float = 1.0):
self.scaler = StandardScaler()
self.model = LogisticRegression(
penalty=penalty, C=C, solver="saga", max_iter=5000
)
def fit(self, X: pd.DataFrame, y: pd.Series) -> 'CryptoLogisticModel':
X_scaled = self.scaler.fit_transform(X)
self.model.fit(X_scaled, y)
return self
def predict_proba(self, X: pd.DataFrame) -> np.ndarray:
X_scaled = self.scaler.transform(X)
return self.model.predict_proba(X_scaled)[:, 1]
def predict(self, X: pd.DataFrame) -> np.ndarray:
X_scaled = self.scaler.transform(X)
return self.model.predict(X_scaled)
def coefficient_summary(self, feature_names: List[str]) -> pd.DataFrame:
return pd.DataFrame({
"feature": feature_names,
"coefficient": self.model.coef_[0],
"abs_coefficient": np.abs(self.model.coef_[0]),
"odds_ratio": np.exp(self.model.coef_[0])
}).sort_values("abs_coefficient", ascending=False)

Usage Example

# Build crypto factor model
factor_model = CryptoFactorModel(
universe=["ETHUSDT", "SOLUSDT", "AVAXUSDT", "LINKUSDT",
"DOTUSDT", "MATICUSDT", "AAVEUSDT", "UNIUSDT"],
market_symbol="BTCUSDT"
)
factor_model.construct_factors()
betas = factor_model.estimate_betas(window=60)
# Fama-Macbeth risk premia
risk_premia = factor_model.fama_macbeth()
print("Risk Premia Estimates:")
print(risk_premia)
# Lasso feature selection
regressor = RegularizedCryptoRegression()
# ... (with features and targets prepared)

Section 6: Implementation in Rust

Project Structure

ch07_linear_methods_crypto/
├── Cargo.toml
├── src/
│ ├── lib.rs
│ ├── regression/
│ │ ├── mod.rs
│ │ ├── ols.rs
│ │ └── regularized.rs
│ ├── classification/
│ │ ├── mod.rs
│ │ └── logistic.rs
│ └── factor/
│ ├── mod.rs
│ └── model.rs
└── examples/
├── crypto_factor_model.rs
├── lasso_selection.rs
└── rolling_regression.rs

Core Library (src/lib.rs)

pub mod regression;
pub mod classification;
pub mod factor;
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone)]
pub struct RegressionResult {
pub coefficients: Vec<f64>,
pub intercept: f64,
pub r_squared: f64,
pub residuals: Vec<f64>,
pub feature_names: Vec<String>,
}
impl RegressionResult {
pub fn display(&self) {
println!("Regression Results (R² = {:.4}):", self.r_squared);
println!(" Intercept: {:.6}", self.intercept);
for (name, coef) in self.feature_names.iter().zip(self.coefficients.iter()) {
println!(" {}: {:.6}", name, coef);
}
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FactorLoading {
pub timestamp: i64,
pub alpha: f64,
pub betas: Vec<f64>,
pub r_squared: f64,
}

OLS Regression (src/regression/ols.rs)

use crate::RegressionResult;
pub struct OLSRegression;
impl OLSRegression {
/// Solve OLS: beta = (X^T X)^{-1} X^T y
pub fn fit(
x: &[Vec<f64>],
y: &[f64],
feature_names: &[String],
fit_intercept: bool,
) -> RegressionResult {
let n = y.len();
let p = x[0].len();
let p_total = if fit_intercept { p + 1 } else { p };
// Build design matrix
let mut xtx = vec![vec![0.0; p_total]; p_total];
let mut xty = vec![0.0; p_total];
for i in 0..n {
let row = Self::build_row(&x[i], fit_intercept);
for j in 0..p_total {
xty[j] += row[j] * y[i];
for k in 0..=j {
xtx[j][k] += row[j] * row[k];
if j != k {
xtx[k][j] = xtx[j][k];
}
}
}
}
// Solve using Cholesky-like decomposition (simplified)
let beta = Self::solve_linear_system(&xtx, &xty);
// Compute predictions and R²
let mut ss_res = 0.0;
let mut ss_tot = 0.0;
let y_mean = y.iter().sum::<f64>() / n as f64;
let mut residuals = Vec::with_capacity(n);
for i in 0..n {
let row = Self::build_row(&x[i], fit_intercept);
let y_pred: f64 = row.iter().zip(beta.iter()).map(|(a, b)| a * b).sum();
let res = y[i] - y_pred;
residuals.push(res);
ss_res += res * res;
ss_tot += (y[i] - y_mean) * (y[i] - y_mean);
}
let r_squared = if ss_tot > 0.0 { 1.0 - ss_res / ss_tot } else { 0.0 };
let (intercept, coefficients) = if fit_intercept {
(beta[0], beta[1..].to_vec())
} else {
(0.0, beta)
};
RegressionResult {
coefficients,
intercept,
r_squared,
residuals,
feature_names: feature_names.to_vec(),
}
}
fn build_row(x: &[f64], fit_intercept: bool) -> Vec<f64> {
if fit_intercept {
let mut row = vec![1.0];
row.extend_from_slice(x);
row
} else {
x.to_vec()
}
}
fn solve_linear_system(a: &[Vec<f64>], b: &[f64]) -> Vec<f64> {
let n = b.len();
let mut aug = vec![vec![0.0; n + 1]; n];
for i in 0..n {
for j in 0..n {
aug[i][j] = a[i][j];
}
aug[i][n] = b[i];
}
// Gaussian elimination with partial pivoting
for i in 0..n {
let mut max_row = i;
for k in (i + 1)..n {
if aug[k][i].abs() > aug[max_row][i].abs() {
max_row = k;
}
}
aug.swap(i, max_row);
let pivot = aug[i][i];
if pivot.abs() < 1e-12 {
continue;
}
for j in i..=n {
aug[i][j] /= pivot;
}
for k in 0..n {
if k != i {
let factor = aug[k][i];
for j in i..=n {
aug[k][j] -= factor * aug[i][j];
}
}
}
}
(0..n).map(|i| aug[i][n]).collect()
}
/// Rolling OLS regression
pub fn rolling(
x: &[Vec<f64>],
y: &[f64],
feature_names: &[String],
window: usize,
) -> Vec<RegressionResult> {
let n = y.len();
let mut results = Vec::new();
for i in window..n {
let x_window: Vec<Vec<f64>> = x[i - window..i].to_vec();
let y_window: Vec<f64> = y[i - window..i].to_vec();
let result = Self::fit(&x_window, &y_window, feature_names, true);
results.push(result);
}
results
}
}

Regularized Regression (src/regression/regularized.rs)

pub struct RidgeRegression {
pub alpha: f64,
pub coefficients: Vec<f64>,
pub intercept: f64,
}
impl RidgeRegression {
pub fn new(alpha: f64) -> Self {
Self {
alpha,
coefficients: Vec::new(),
intercept: 0.0,
}
}
/// Fit Ridge: beta = (X^T X + alpha * I)^{-1} X^T y
pub fn fit(&mut self, x: &[Vec<f64>], y: &[f64]) {
let n = y.len();
let p = x[0].len();
// Center data
let y_mean = y.iter().sum::<f64>() / n as f64;
let x_means: Vec<f64> = (0..p).map(|j| {
x.iter().map(|row| row[j]).sum::<f64>() / n as f64
}).collect();
// Build X^T X + alpha * I
let mut xtx = vec![vec![0.0; p]; p];
let mut xty = vec![0.0; p];
for i in 0..n {
for j in 0..p {
let xj = x[i][j] - x_means[j];
xty[j] += xj * (y[i] - y_mean);
for k in 0..=j {
let xk = x[i][k] - x_means[k];
xtx[j][k] += xj * xk;
if j != k {
xtx[k][j] = xtx[j][k];
}
}
}
}
// Add ridge penalty
for j in 0..p {
xtx[j][j] += self.alpha;
}
// Solve
self.coefficients = Self::solve(&xtx, &xty);
self.intercept = y_mean - x_means.iter()
.zip(self.coefficients.iter())
.map(|(m, c)| m * c)
.sum::<f64>();
}
pub fn predict(&self, x: &[Vec<f64>]) -> Vec<f64> {
x.iter().map(|row| {
self.intercept + row.iter()
.zip(self.coefficients.iter())
.map(|(xi, ci)| xi * ci)
.sum::<f64>()
}).collect()
}
fn solve(a: &[Vec<f64>], b: &[f64]) -> Vec<f64> {
let n = b.len();
let mut aug = vec![vec![0.0; n + 1]; n];
for i in 0..n {
for j in 0..n {
aug[i][j] = a[i][j];
}
aug[i][n] = b[i];
}
for i in 0..n {
let mut max_row = i;
for k in (i + 1)..n {
if aug[k][i].abs() > aug[max_row][i].abs() {
max_row = k;
}
}
aug.swap(i, max_row);
let pivot = aug[i][i];
if pivot.abs() < 1e-12 { continue; }
for j in i..=n { aug[i][j] /= pivot; }
for k in 0..n {
if k != i {
let factor = aug[k][i];
for j in i..=n { aug[k][j] -= factor * aug[i][j]; }
}
}
}
(0..n).map(|i| aug[i][n]).collect()
}
}
pub struct LassoRegression {
pub alpha: f64,
pub coefficients: Vec<f64>,
pub intercept: f64,
pub max_iter: usize,
pub tolerance: f64,
}
impl LassoRegression {
pub fn new(alpha: f64) -> Self {
Self {
alpha,
coefficients: Vec::new(),
intercept: 0.0,
max_iter: 10000,
tolerance: 1e-6,
}
}
/// Fit Lasso using coordinate descent
pub fn fit(&mut self, x: &[Vec<f64>], y: &[f64]) {
let n = y.len();
let p = x[0].len();
let y_mean = y.iter().sum::<f64>() / n as f64;
self.coefficients = vec![0.0; p];
let mut residuals: Vec<f64> = y.iter().map(|yi| yi - y_mean).collect();
for _ in 0..self.max_iter {
let mut max_change = 0.0_f64;
for j in 0..p {
// Add back contribution of feature j
for i in 0..n {
residuals[i] += self.coefficients[j] * x[i][j];
}
// Compute partial residual correlation
let rho: f64 = (0..n)
.map(|i| x[i][j] * residuals[i])
.sum::<f64>() / n as f64;
let x_sq: f64 = (0..n)
.map(|i| x[i][j] * x[i][j])
.sum::<f64>() / n as f64;
// Soft thresholding
let new_coef = Self::soft_threshold(rho, self.alpha) / x_sq;
max_change = max_change.max((new_coef - self.coefficients[j]).abs());
self.coefficients[j] = new_coef;
// Update residuals
for i in 0..n {
residuals[i] -= self.coefficients[j] * x[i][j];
}
}
if max_change < self.tolerance {
break;
}
}
self.intercept = y_mean;
}
fn soft_threshold(rho: f64, lambda: f64) -> f64 {
if rho > lambda { rho - lambda }
else if rho < -lambda { rho + lambda }
else { 0.0 }
}
pub fn selected_features(&self) -> Vec<usize> {
self.coefficients.iter()
.enumerate()
.filter(|(_, c)| c.abs() > 1e-10)
.map(|(i, _)| i)
.collect()
}
}

Bybit Data Fetcher

use reqwest;
use serde::Deserialize;
use anyhow::Result;
#[derive(Deserialize)]
struct BybitResponse {
result: BybitResult,
}
#[derive(Deserialize)]
struct BybitResult {
list: Vec<Vec<String>>,
}
pub async fn fetch_bybit_returns(
symbol: &str,
interval: &str,
limit: u32,
) -> Result<Vec<f64>> {
let client = reqwest::Client::new();
let resp = client
.get("https://api.bybit.com/v5/market/kline")
.query(&[
("category", "linear"),
("symbol", symbol),
("interval", interval),
("limit", &limit.to_string()),
])
.send()
.await?
.json::<BybitResponse>()
.await?;
let closes: Vec<f64> = resp.result.list
.iter()
.map(|row| row[4].parse::<f64>().unwrap_or(0.0))
.rev()
.collect();
let returns: Vec<f64> = closes.windows(2)
.map(|w| (w[1] - w[0]) / w[0])
.collect();
Ok(returns)
}

Section 7: Practical Examples

Example 1: Building a Crypto Factor Model

factor_model = CryptoFactorModel(
universe=["ETHUSDT", "SOLUSDT", "AVAXUSDT", "LINKUSDT",
"DOTUSDT", "MATICUSDT", "AAVEUSDT", "UNIUSDT"],
market_symbol="BTCUSDT"
)
factors = factor_model.construct_factors()
betas = factor_model.estimate_betas(window=60)
print("Factor Loadings for ETHUSDT (latest):")
print(betas["ETHUSDT"].tail(1).T)
# Expected output:
# alpha 0.000234
# beta_MKT 0.892341
# beta_SIZE -0.045123
# beta_MOM 0.123456
# r_squared 0.723456
risk_premia = factor_model.fama_macbeth()
print("\nFama-Macbeth Risk Premia:")
print(risk_premia)
# Expected output:
# mean std t_stat annualized
# gamma_0 0.0003 0.012 0.354 0.1095
# gamma_MKT 0.0008 0.008 1.414 0.2920
# gamma_SIZE 0.0002 0.006 0.471 0.0730
# gamma_MOM 0.0005 0.009 0.786 0.1825

Example 2: Lasso Feature Selection for Return Prediction

# Create comprehensive feature set
features = pd.DataFrame(index=factor_model.returns.index)
for sym in factor_model.universe[:4]:
ret = factor_model.returns[sym]
features[f"{sym}_ret1"] = ret
features[f"{sym}_ret5"] = ret.rolling(5).sum()
features[f"{sym}_vol"] = ret.rolling(20).std()
features[f"{sym}_mom"] = ret.rolling(30).sum()
features = features.dropna()
target = factor_model.returns["ETHUSDT"].loc[features.index].shift(-1).dropna()
features = features.loc[target.index]
regressor = RegularizedCryptoRegression()
regressor.fit_lasso(features, target, alpha=0.001)
selected = regressor.selected_features(features.columns.tolist())
print(f"Lasso selected {len(selected)} / {len(features.columns)} features:")
for feat in selected:
coef = regressor.model.coef_[features.columns.tolist().index(feat)]
print(f" {feat}: {coef:.6f}")
# Expected output:
# Lasso selected 5 / 16 features:
# ETHUSDT_ret1: 0.034521
# SOLUSDT_ret1: 0.012345
# ETHUSDT_vol: -0.087654
# LINKUSDT_mom: 0.005432
# SOLUSDT_vol: -0.023456

Example 3: Rolling Regression Regime Detection

# Track BTC beta of ETH over time
eth_returns = factor_model.returns["ETHUSDT"]
btc_returns = factor_model.returns["BTCUSDT"]
common = eth_returns.index.intersection(btc_returns.index)
window = 30
rolling_betas = []
for i in range(window, len(common)):
y = eth_returns.loc[common[i-window:i]].values
X = btc_returns.loc[common[i-window:i]].values.reshape(-1, 1)
model = LinearRegression()
model.fit(X, y)
rolling_betas.append({
"date": common[i],
"beta": model.coef_[0],
"alpha": model.intercept_,
"r_squared": model.score(X, y)
})
df_betas = pd.DataFrame(rolling_betas).set_index("date")
print("Rolling Beta Statistics:")
print(f" Mean beta: {df_betas['beta'].mean():.4f}")
print(f" Std beta: {df_betas['beta'].std():.4f}")
print(f" Min beta: {df_betas['beta'].min():.4f} (regime: divergence)")
print(f" Max beta: {df_betas['beta'].max():.4f} (regime: high correlation)")
print(f" Mean R²: {df_betas['r_squared'].mean():.4f}")
# Expected output:
# Rolling Beta Statistics:
# Mean beta: 0.9234
# Std beta: 0.2156
# Min beta: 0.4523 (regime: divergence)
# Max beta: 1.4567 (regime: high correlation)
# Mean R²: 0.6789

Section 8: Backtesting Framework

Framework Components

The linear methods backtesting framework includes:

  1. Factor Data Pipeline: Constructs factor returns from Bybit/yfinance data
  2. Rolling Regression Engine: Estimates time-varying factor loadings
  3. Signal Generator: Translates alpha estimates and factor views into signals
  4. Position Sizer: Uses coefficient significance for position scaling
  5. Performance Tracker: Computes strategy-level and factor-level attribution

Metrics Dashboard

MetricDescriptionComputation
Factor R²Explanatory power of factors1 - SS_res / SS_tot
Alpha (annualized)Excess return beyond factorsintercept * 365
Alpha t-statStatistical significance of alphaalpha / se(alpha)
Beta stabilityCoefficient of variation of rolling betastd(beta) / mean(beta)
Information CoefficientCorrelation of predictions with outcomescorr(y_hat, y)
Factor SharpeRisk-adjusted factor returnmean(F) / std(F) * sqrt(365)
Lasso sparsityFraction of zero coefficientscount(

Sample Results

=== Linear Methods Backtest: Crypto Factor Model ===
Period: 2024-01-01 to 2024-12-31
Universe: 8 altcoins | Benchmark: BTCUSDT
Factor Model: MKT + SIZE + MOM (3 factors)
Factor Performance:
Factor | Ann.Return | Sharpe | Significance (t-stat)
--------|------------|--------|---------------------
MKT | 62.3% | 1.45 | 3.21 ***
SIZE | 8.7% | 0.42 | 1.12
MOM | 15.2% | 0.78 | 1.89 *
Prediction Model (Ridge, alpha=1.0):
In-sample R²: 0.0423
Out-of-sample R²: 0.0187
Information Coeff: 0.137
Direction accuracy: 0.534
Lasso Feature Selection (alpha=0.001):
Features selected: 5 / 16
OOS R² (selected): 0.0201
OOS R² (all feat): 0.0145
Rolling Regression (30-day window):
Mean BTC beta (ETH): 0.923 +/- 0.216
Beta range: [0.452, 1.457]
R² range: [0.312, 0.891]
Regime shifts detected: 4

Section 9: Performance Evaluation

Comparison of Linear Methods on Crypto Data

MethodIn-Sample R²OOS R²Direction Acc.SparsityStability
OLS (all features)0.0520.0080.5120%Low
OLS (5 features)0.0310.0180.52869%Medium
Ridge (CV alpha)0.0480.0210.5310%High
Lasso (CV alpha)0.0350.0190.52965%Medium
Elastic Net (0.5)0.0410.0200.53050%High
Logistic (L1)N/AN/A0.53460%Medium
Rolling OLS (30d)0.0670.0150.5230%Low

Key Findings

  1. Ridge regression consistently achieves the best out-of-sample R² on crypto data, because the L2 penalty stabilizes coefficient estimates without forcing them to zero, which is important when many features carry small but non-zero signal.

  2. Lasso’s feature selection is valuable but can be unstable: the set of selected features changes significantly across time periods, suggesting that feature importance in crypto is regime-dependent.

  3. Factor models explain 30-70% of altcoin variance through BTC beta alone. Adding size and momentum factors improves explanatory power by 5-10%, but on-chain factors remain weak due to noisy data.

  4. Rolling regression reveals clear regime shifts: BTC-ETH beta ranges from 0.45 during divergence periods to 1.45 during high-correlation crashes. Monitoring beta trends provides early warning of regime changes.

  5. Direction accuracy above 53% is achievable with regularized logistic regression on carefully selected features, translating to positive expected trading PnL after Bybit fees.

Limitations

  • Linear models cannot capture interaction effects or non-linear factor relationships
  • Factor construction depends on asset universe selection, introducing survivorship bias
  • Fama-Macbeth standard errors may be understated due to cross-sectional dependence
  • Rolling regression estimates are lagged and may miss rapid regime transitions
  • OLS coefficient estimates are biased when features contain measurement error (errors-in-variables problem)

Section 10: Future Directions

  1. Non-Linear Factor Models: Extending linear factor models with kernel methods or polynomial features to capture non-linear relationships between factors and returns, while maintaining the interpretability advantages of the factor model framework.

  2. High-Frequency Factor Models: Adapting factor models to intraday frequencies (1-minute, tick-level) using Bybit order book data, capturing microstructure factors like order flow imbalance and queue position.

  3. Dynamic Factor Loading Models: Implementing state-space models (Kalman filter) for continuous tracking of factor loadings, replacing the discontinuous rolling window approach with smooth, real-time estimates.

  4. On-Chain Factor Innovation: Developing novel crypto-specific factors from blockchain data, including MEV (Maximal Extractable Value), validator behavior, and cross-chain bridge flows.

  5. Bayesian Linear Regression: Placing informative priors on factor loadings based on economic theory (e.g., BTC beta should be positive for most altcoins), improving estimation in small samples.

  6. Instrumental Variables for Crypto: Addressing endogeneity in crypto factor models using instrumental variables, such as using mining difficulty as an instrument for BTC supply shocks.


References

  1. Fama, E. F., & French, K. R. (1993). “Common Risk Factors in the Returns on Stocks and Bonds.” Journal of Financial Economics, 33(1), 3-56.

  2. Tibshirani, R. (1996). “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society Series B, 58(1), 267-288.

  3. Hoerl, A. E., & Kennard, R. W. (1970). “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics, 12(1), 55-67.

  4. Zou, H., & Hastie, T. (2005). “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society Series B, 67(2), 301-320.

  5. Fama, E. F., & MacBeth, J. D. (1973). “Risk, Return, and Equilibrium: Empirical Tests.” Journal of Political Economy, 81(3), 607-636.

  6. Liu, Y., Tsyvinski, A., & Wu, X. (2022). “Common Risk Factors in Cryptocurrency.” The Journal of Finance, 77(2), 1133-1177.

  7. Newey, W. K., & West, K. D. (1987). “A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.” Econometrica, 55(3), 703-708.