Skip to content

Chapter 332: Normalizing Flows for Finance

Chapter 332: Normalizing Flows for Finance

Overview

Normalizing Flows are a class of deep generative models that learn complex probability distributions by transforming a simple base distribution (like Gaussian) through a sequence of invertible, differentiable transformations. Unlike other generative models (VAEs, GANs), normalizing flows provide exact likelihood computation, making them ideal for financial applications where accurate density estimation is crucial for risk management.

Why Normalizing Flows for Finance?

The Problem with Traditional Approaches

Financial returns are notoriously non-Gaussian:

  • Fat tails: Extreme events happen more frequently than normal distributions predict
  • Skewness: Returns are often asymmetric (larger drops than gains)
  • Time-varying volatility: The distribution shape changes over time
  • Multimodality: Multiple market regimes create complex distributions

Traditional risk models (VaR, CVaR) assume Gaussian returns, leading to:

  • Underestimation of tail risk
  • Poor hedging decisions
  • Unexpected losses during market stress

Normalizing Flow Solution

Normalizing Flows learn the true distribution of returns:

Traditional: Assume X ~ N(μ, σ²) → Underestimate tail risk
Normalizing Flow: Learn p(X) directly → Accurate density for any shape
z ~ N(0, I) [Simple base distribution]
x = f(z) [Invertible transformation]
p(x) = p(z)|det(∂f⁻¹/∂x)| [Exact likelihood via change of variables]

Mathematical Foundation

Change of Variables Formula

The core principle of normalizing flows is the change of variables formula:

Given:

  • Base distribution: z ~ p_z(z) (typically standard normal)
  • Invertible transformation: x = f(z), so z = f⁻¹(x)
  • Target distribution: p_x(x)

The density transformation is:

p_x(x) = p_z(f⁻¹(x)) |det(J_{f⁻¹}(x))|
where J_{f⁻¹}(x) = ∂f⁻¹(x)/∂x is the Jacobian matrix

For a sequence of K transformations:

z₀ → f₁ → z₁ → f₂ → z₂ → ... → f_K → x
log p(x) = log p(z₀) - Σᵢ log|det(J_{fᵢ})|

Why Jacobian Matters

The Jacobian determinant accounts for how the transformation stretches or compresses space:

┌────────────────────────────────────────────────────────────┐
│ JACOBIAN INTUITION │
├────────────────────────────────────────────────────────────┤
│ │
│ Base Distribution (z) Target Distribution (x) │
│ │
│ ┌─────────┐ ┌──────────────┐ │
│ │ *** │ f(z) │ *** │ │
│ │ ***** │ ──────────► │ *** ** │ │
│ │ *** │ │ **** *** │ │
│ └─────────┘ └──────────────┘ │
│ │
│ Gaussian blob Complex distribution │
│ (easy to sample) (hard to model directly) │
│ │
│ Jacobian = How much volume changes at each point │
│ |det(J)| > 1 → Space expands → Density decreases │
│ |det(J)| < 1 → Space contracts → Density increases │
│ │
└────────────────────────────────────────────────────────────┘

Types of Normalizing Flows

1. Affine Coupling Flows (RealNVP)

Key Idea: Split the input and apply simple transformations that have tractable Jacobians.

┌─────────────────────────────────────────────────────────────┐
│ AFFINE COUPLING LAYER │
├─────────────────────────────────────────────────────────────┤
│ │
│ Input: x = [x₁, x₂] (split into two parts) │
│ │
│ Transformation: │
│ y₁ = x₁ (unchanged) │
│ y₂ = x₂ ⊙ exp(s(x₁)) + t(x₁) (affine transform) │
│ │
│ where s() and t() are neural networks │
│ │
│ Jacobian is TRIANGULAR → det = ∏ exp(s(x₁)) = exp(Σs) │
│ Very efficient! O(D) instead of O(D³) │
│ │
│ ┌──────────┐ │
│ x₁ ──┤ Identity ├──────────────────────────────► y₁ │
│ └──────────┘ │
│ │ │
│ ▼ │
│ ┌────────┐ ┌─────────────────────┐ │
│ x₂ ──┤ Neural ├──s,t──►│ y₂ = x₂·exp(s) + t │──► y₂ │
│ │Networks│ └─────────────────────┘ │
│ └────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘

2. Autoregressive Flows (MAF/IAF)

Masked Autoregressive Flow (MAF): Each dimension depends on previous dimensions.

x₁ = z₁ · σ₁ + μ₁
x₂ = z₂ · σ₂(x₁) + μ₂(x₁)
x₃ = z₃ · σ₃(x₁,x₂) + μ₃(x₁,x₂)
...
Jacobian is LOWER TRIANGULAR → det = ∏ σᵢ

Inverse Autoregressive Flow (IAF): Opposite direction for faster sampling.

MAF: Fast density, slow sampling
IAF: Fast sampling, slow density
┌────────────────────────────────────────┐
│ MAF vs IAF │
├────────────────────────────────────────┤
│ Operation │ MAF │ IAF │
├───────────────┼─────────┼─────────────┤
│ log p(x) │ O(1) │ O(D) │
│ Sampling │ O(D) │ O(1) │
│ Training │ Fast │ Slow │
│ Generation │ Slow │ Fast │
└────────────────────────────────────────┘

3. Continuous Normalizing Flows (Neural ODE)

Key Idea: Instead of discrete transformations, define a continuous flow via an ODE:

dz/dt = f(z, t; θ)
Log-likelihood change:
d log p(z)/dt = -tr(∂f/∂z)
Solved via numerical integration (adjoint method)

Model Architecture

┌─────────────────────────────────────────────────────────────────┐
│ NORMALIZING FLOW FOR FINANCIAL RETURNS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ INPUT LAYER │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Financial Returns Data: │ │
│ │ - Daily/hourly returns │ │
│ │ - Multi-asset returns (portfolio) │ │
│ │ - Conditional features (volatility, volume, etc.) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ │
│ PREPROCESSING │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Standardization: (x - μ) / σ │ │
│ │ Winsorization: clip extreme values │ │
│ │ Optional: Add context features for conditional flow │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ │
│ FLOW BLOCKS (×N) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ ┌────────────────────────────────────────────────────┐ │ │
│ │ │ Affine Coupling Layer 1 │ │ │
│ │ │ x₁ unchanged, x₂ transformed via s(x₁), t(x₁) │ │ │
│ │ └────────────────────────────────────────────────────┘ │ │
│ │ ↓ │ │
│ │ ┌────────────────────────────────────────────────────┐ │ │
│ │ │ Permutation / Shuffling │ │ │
│ │ │ Ensure all dimensions get transformed │ │ │
│ │ └────────────────────────────────────────────────────┘ │ │
│ │ ↓ │ │
│ │ ┌────────────────────────────────────────────────────┐ │ │
│ │ │ Affine Coupling Layer 2 │ │ │
│ │ │ Opposite split (x₂ unchanged, x₁ transformed) │ │ │
│ │ └────────────────────────────────────────────────────┘ │ │
│ │ ↓ │ │
│ │ ┌────────────────────────────────────────────────────┐ │ │
│ │ │ Batch Normalization (optional) │ │ │
│ │ │ Stabilize training │ │ │
│ │ └────────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ │
│ BASE DISTRIBUTION │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Standard Gaussian: z ~ N(0, I) │ │
│ │ Or Student-t for heavier tails: z ~ t(ν, 0, I) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ │
│ OUTPUT │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ log p(x) = log p(z) + Σ log|det(J_k)| │ │
│ │ Samples: z ~ p(z) → x = f(z) │ │
│ │ Density: x → z = f⁻¹(x) → p(x) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Financial Applications

1. Density Estimation for Returns

def estimate_return_density(model, returns):
"""
Estimate the probability density of returns.
Args:
model: Trained normalizing flow
returns: Array of return values
Returns:
log_prob: Log probability density at each point
"""
# Transform returns to latent space
z, log_det = model.inverse(returns)
# Compute base distribution log probability
log_pz = -0.5 * (z**2 + np.log(2*np.pi)).sum(dim=-1)
# Total log probability via change of variables
log_prob = log_pz + log_det
return log_prob

2. Value at Risk (VaR) with Learned Densities

Traditional VaR assumes Gaussian returns. With normalizing flows:

┌─────────────────────────────────────────────────────────────┐
│ VaR COMPARISON │
├─────────────────────────────────────────────────────────────┤
│ │
│ Gaussian VaR (underestimates tail risk): │
│ VaR_α = μ + σ · Φ⁻¹(α) │
│ │
│ Normalizing Flow VaR (accurate): │
│ VaR_α = quantile from learned distribution p(x) │
│ Found via: ∫_{-∞}^{VaR} p(x)dx = α │
│ │
│ Monte Carlo approach: │
│ 1. Sample N points from flow: xᵢ ~ p(x) │
│ 2. Sort samples │
│ 3. VaR_α = x_{⌊αN⌋} │
│ │
└─────────────────────────────────────────────────────────────┘

3. Conditional Value at Risk (CVaR / Expected Shortfall)

def compute_cvar(model, alpha=0.05, n_samples=100000):
"""
Compute CVaR using normalizing flow samples.
CVaR_α = E[X | X ≤ VaR_α]
"""
# Generate samples from learned distribution
z = torch.randn(n_samples, model.dim)
samples = model.forward(z)
# Find VaR threshold
var = np.percentile(samples, alpha * 100)
# Average of samples below VaR
cvar = samples[samples <= var].mean()
return var, cvar

4. Synthetic Data Generation

Generate realistic return scenarios for:

  • Stress testing
  • Backtesting on more data
  • Training other models
  • Monte Carlo simulations
def generate_synthetic_returns(model, n_scenarios, conditioning=None):
"""
Generate synthetic return scenarios from learned distribution.
"""
# Sample from base distribution
z = torch.randn(n_scenarios, model.dim)
# Transform through flow
if conditioning is not None:
# Conditional generation (e.g., high volatility regime)
synthetic_returns = model.forward(z, conditioning)
else:
synthetic_returns = model.forward(z)
return synthetic_returns

5. Tail Risk Modeling

┌─────────────────────────────────────────────────────────────┐
│ TAIL RISK COMPARISON │
├─────────────────────────────────────────────────────────────┤
│ │
│ Probability of -10% daily return: │
│ │
│ Gaussian (σ=2%): P(X < -10%) = 3 × 10⁻⁷ (very rare) │
│ Historical: P(X < -10%) ≈ 0.1% (happens!) │
│ Normalizing Flow: P(X < -10%) ≈ 0.08% (accurate!) │
│ │
│ The flow learns the TRUE tail behavior! │
│ │
│ Gaussian │
│ │ *** vs Learned Flow │
│ │ ******* *** │
│ P(x) │ ********* ****** │
│ │*********** ******** │
│ │ * ********** │
│ └──────────────┘ ──────────┘ │
│ thin tails fat tails │
│ │
└─────────────────────────────────────────────────────────────┘

Additional Flow Architectures

NICE (Non-linear Independent Components Estimation)

The simplest flow architecture using additive coupling:

# Additive coupling layer
def nice_forward(x, mask):
x1, x2 = x * mask, x * (1 - mask)
y1 = x1
y2 = x2 + neural_net(x1) # Additive transformation
return y1 + y2
# Inverse is trivial!
def nice_inverse(y, mask):
y1, y2 = y * mask, y * (1 - mask)
x1 = y1
x2 = y2 - neural_net(y1) # Simply subtract
return x1 + x2

Glow (Generative Flow with Invertible 1x1 Convolutions)

A more expressive architecture combining three components:

Glow Block:
├── ActNorm: Learned activation normalization
├── 1x1 Convolution: Learnable permutation
└── Affine Coupling: RealNVP-style transformation
Multi-scale architecture:
Level 1: [Flow Block x K] → Split
Level 2: [Flow Block x K] → Split
Level L: [Flow Block x K] → Final z

ActNorm (Activation Normalization)

Data-dependent initialization that stabilizes training:

class ActNorm(nn.Module):
"""Activation normalization with data-dependent initialization"""
def __init__(self, dim):
super().__init__()
self.dim = dim
self.scale = nn.Parameter(torch.ones(1, dim))
self.bias = nn.Parameter(torch.zeros(1, dim))
self.initialized = False
def initialize(self, x):
"""Data-dependent initialization"""
with torch.no_grad():
mean = x.mean(dim=0, keepdim=True)
std = x.std(dim=0, keepdim=True)
self.bias.data = -mean
self.scale.data = 1.0 / (std + 1e-6)
self.initialized = True
def forward(self, x):
if not self.initialized:
self.initialize(x)
y = (x + self.bias) * self.scale
log_det = torch.log(torch.abs(self.scale)).sum() * x.shape[0]
return y, log_det
def inverse(self, y):
x = y / self.scale - self.bias
return x

Flow Matching (Modern Approach)

A newer, simpler training paradigm for continuous normalizing flows:

class FlowMatchingTrader:
"""Modern flow matching approach for trading signals"""
def __init__(self, vector_field_net):
self.v_net = vector_field_net # Neural network for vector field
def flow_matching_loss(self, x0, x1):
"""
Flow matching training objective
x0: noise samples (base distribution)
x1: data samples (market features)
"""
# Random time
t = torch.rand(x0.shape[0], 1)
# Interpolate between noise and data
xt = (1 - t) * x0 + t * x1
# Target velocity (optimal transport)
ut = x1 - x0
# Predicted velocity
vt = self.v_net(xt, t)
# MSE loss
loss = ((vt - ut) ** 2).mean()
return loss
def sample(self, num_samples, steps=100):
"""Generate samples using ODE integration"""
x = torch.randn(num_samples, self.dim)
dt = 1.0 / steps
for t in torch.linspace(0, 1, steps):
v = self.v_net(x, t.expand(num_samples, 1))
x = x + v * dt
return x

Trading Applications: Order Flow and Microstructure

Order Flow Prediction

class OrderFlowPredictor:
"""Predict order flow using conditional flow model"""
def __init__(self, flow_model, context_encoder):
self.flow = flow_model
self.encoder = context_encoder
def predict(self, market_context, num_samples=1000):
# Encode market context
context = self.encoder(market_context)
# Sample from latent space
z = torch.randn(num_samples, self.flow.latent_dim)
# Generate order flow predictions
predictions = self.flow.inverse(z, context)
return {
'expected_flow': predictions.mean(dim=0),
'uncertainty': predictions.std(dim=0),
'samples': predictions
}

Market Microstructure Modeling

class MicrostructureFlow:
"""Model order book dynamics with normalizing flows"""
def compute_likelihood(self, order_book_state):
"""Compute log-likelihood of order book configuration"""
z, log_det = self.flow.forward(order_book_state)
log_pz = self.base_dist.log_prob(z).sum(dim=-1)
return log_pz + log_det
def detect_anomaly(self, order_book_state, threshold=-10.0):
"""Detect unusual order book configurations"""
log_px = self.compute_likelihood(order_book_state)
return log_px < threshold
def simulate_book_evolution(self, initial_state, steps=100):
"""Simulate future order book states"""
states = [initial_state]
for _ in range(steps):
z, _ = self.flow.forward(states[-1])
z_next = z + 0.01 * torch.randn_like(z)
next_state = self.flow.inverse(z_next)
states.append(next_state)
return torch.stack(states)

Latent Space Regime Detection

class RegimeDetector:
"""Detect market regimes using flow latent space"""
def __init__(self, flow_model, n_regimes=4):
self.flow = flow_model
self.n_regimes = n_regimes
self.clusterer = GaussianMixture(n_components=n_regimes)
def fit_regimes(self, historical_data):
"""Fit regime clusters on latent representations"""
z_latent, _ = self.flow.forward(historical_data)
self.clusterer.fit(z_latent.detach().numpy())
self.regime_labels = self._analyze_regimes(historical_data, z_latent)
def detect_current_regime(self, current_data):
"""Identify current market regime"""
z, _ = self.flow.forward(current_data)
regime = self.clusterer.predict(z.detach().numpy())
probs = self.clusterer.predict_proba(z.detach().numpy())
return {
'regime': regime[0],
'label': self.regime_labels[regime[0]],
'confidence': probs.max(),
'regime_probs': dict(zip(self.regime_labels, probs[0]))
}

Stress Testing with Flows

class FlowStressTester:
"""Generate stress scenarios from low-likelihood regions"""
def __init__(self, flow_model):
self.flow = flow_model
def stress_test(self, portfolio, scenario_likelihood_threshold=-20.0):
# Find low-likelihood regions in latent space
z_extreme = torch.randn(1000, self.flow.latent_dim) * 3 # Far from mean
extreme_scenarios = self.flow.inverse(z_extreme)
log_probs = self.flow.log_prob(extreme_scenarios)
# Select extreme but plausible scenarios
mask = log_probs > scenario_likelihood_threshold
stress_scenarios = extreme_scenarios[mask]
impacts = [(scenario * portfolio.weights).sum().item()
for scenario in stress_scenarios]
return {
'scenarios': stress_scenarios,
'impacts': impacts,
'worst_case': min(impacts),
'expected_shortfall': np.mean(sorted(impacts)[:int(len(impacts)*0.05)])
}

Microstructure Data Requirements

For high-frequency trading applications, flow models benefit from rich microstructure features:

Market Data for Flow Models:
├── High-frequency data (tick-level preferred)
│ └── Order flow, trades, quotes
├── Order book snapshots
│ └── Multi-level bid/ask with sizes
├── Volume data
│ └── Buy/sell decomposition
└── Derived features
├── Order flow imbalance (OFI)
├── Volume-weighted price deviation
├── Spread dynamics
├── Depth imbalance
├── VPIN (Volume-synchronized PIN)
└── Kyle's lambda estimates

Implementation Details

Network Architecture for Scale/Translation Networks

class CouplingNetwork(nn.Module):
"""
Neural network for computing scale and translation in coupling layers.
"""
def __init__(self, input_dim, hidden_dim=256, n_layers=3):
super().__init__()
layers = [nn.Linear(input_dim, hidden_dim), nn.ReLU()]
for _ in range(n_layers - 1):
layers.extend([nn.Linear(hidden_dim, hidden_dim), nn.ReLU()])
# Output scale and translation
self.net = nn.Sequential(*layers)
self.scale_net = nn.Linear(hidden_dim, input_dim)
self.translation_net = nn.Linear(hidden_dim, input_dim)
# Initialize to identity transform
nn.init.zeros_(self.scale_net.weight)
nn.init.zeros_(self.scale_net.bias)
nn.init.zeros_(self.translation_net.weight)
nn.init.zeros_(self.translation_net.bias)
def forward(self, x):
h = self.net(x)
s = self.scale_net(h)
t = self.translation_net(h)
return s, t

Affine Coupling Layer

class AffineCouplingLayer(nn.Module):
"""
Affine coupling layer as used in RealNVP.
"""
def __init__(self, dim, mask):
super().__init__()
self.dim = dim
self.register_buffer('mask', mask)
self.coupling_net = CouplingNetwork(dim // 2, hidden_dim=256)
def forward(self, x):
"""Forward pass: data space -> latent space"""
x_masked = x * self.mask
s, t = self.coupling_net(x_masked)
# Apply transformation to unmasked part
y = x_masked + (1 - self.mask) * (x * torch.exp(s) + t)
# Log determinant of Jacobian
log_det = (s * (1 - self.mask)).sum(dim=-1)
return y, log_det
def inverse(self, y):
"""Inverse pass: latent space -> data space"""
y_masked = y * self.mask
s, t = self.coupling_net(y_masked)
# Inverse transformation
x = y_masked + (1 - self.mask) * (y - t) * torch.exp(-s)
# Log determinant (negative for inverse)
log_det = -(s * (1 - self.mask)).sum(dim=-1)
return x, log_det

Complete Normalizing Flow Model

class NormalizingFlow(nn.Module):
"""
Complete normalizing flow for density estimation.
"""
def __init__(self, dim, n_layers=8, hidden_dim=256):
super().__init__()
self.dim = dim
# Create alternating masks
masks = []
for i in range(n_layers):
mask = torch.zeros(dim)
mask[:dim//2] = 1.0 if i % 2 == 0 else 0.0
mask[dim//2:] = 0.0 if i % 2 == 0 else 1.0
masks.append(mask)
# Stack coupling layers
self.layers = nn.ModuleList([
AffineCouplingLayer(dim, masks[i])
for i in range(n_layers)
])
# Base distribution
self.register_buffer('base_mean', torch.zeros(dim))
self.register_buffer('base_std', torch.ones(dim))
def forward(self, z):
"""Transform from latent space to data space"""
x = z
for layer in self.layers:
x, _ = layer.inverse(x)
return x
def inverse(self, x):
"""Transform from data space to latent space"""
z = x
total_log_det = 0
for layer in reversed(self.layers):
z, log_det = layer(z)
total_log_det += log_det
return z, total_log_det
def log_prob(self, x):
"""Compute log probability of data"""
z, log_det = self.inverse(x)
log_pz = -0.5 * (z**2 + np.log(2*np.pi)).sum(dim=-1)
return log_pz + log_det
def sample(self, n_samples):
"""Generate samples from learned distribution"""
z = torch.randn(n_samples, self.dim)
return self.forward(z)

Training Configuration

model:
dim: 1 # Univariate returns (or portfolio dimension)
n_layers: 8
hidden_dim: 256
activation: "relu"
use_batch_norm: true
training:
batch_size: 256
learning_rate: 0.0001
weight_decay: 0.0001
max_epochs: 500
early_stopping_patience: 20
gradient_clip: 1.0
data:
train_split: 0.7
val_split: 0.15
test_split: 0.15
lookback_window: 252 # 1 year of daily data
returns_type: "log" # log returns
standardize: true

Risk Metrics with Normalizing Flows

VaR Calculation

def compute_var_flow(model, alpha_levels=[0.01, 0.05, 0.10], n_samples=100000):
"""
Compute Value at Risk at multiple confidence levels.
"""
# Generate samples
samples = model.sample(n_samples).detach().numpy().flatten()
var_results = {}
for alpha in alpha_levels:
var = np.percentile(samples, alpha * 100)
var_results[f'VaR_{int((1-alpha)*100)}'] = var
return var_results

CVaR/Expected Shortfall

def compute_cvar_flow(model, alpha=0.05, n_samples=100000):
"""
Compute Conditional VaR (Expected Shortfall).
"""
samples = model.sample(n_samples).detach().numpy().flatten()
var = np.percentile(samples, alpha * 100)
cvar = samples[samples <= var].mean()
return var, cvar

Tail Probability

def compute_tail_probability(model, threshold, n_samples=100000):
"""
Compute probability of returns below threshold.
P(X < threshold)
"""
samples = model.sample(n_samples).detach().numpy().flatten()
tail_prob = (samples < threshold).mean()
return tail_prob

Trading Strategy Integration

Signal Generation Based on Density

def generate_density_signals(model, current_return, historical_returns):
"""
Generate trading signals based on return density position.
If current return is in low-probability region, expect mean reversion.
"""
# Compute log probability of current return
log_prob = model.log_prob(torch.tensor([[current_return]])).item()
# Compute percentile of current return
samples = model.sample(100000).numpy().flatten()
percentile = (samples < current_return).mean()
# Signal logic
if percentile < 0.05: # Extreme low return
return Signal("LONG", confidence=1 - percentile,
reason="Extreme negative return, expect bounce")
elif percentile > 0.95: # Extreme high return
return Signal("SHORT", confidence=percentile,
reason="Extreme positive return, expect pullback")
else:
return Signal("NEUTRAL", confidence=0.5)

Portfolio Risk Management

class FlowBasedRiskManager:
"""
Risk manager using normalizing flow for position sizing.
"""
def __init__(self, flow_model, max_var_pct=0.02):
self.model = flow_model
self.max_var = max_var_pct
def compute_position_size(self, capital, confidence=0.99):
"""
Size position so that 99% VaR doesn't exceed max_var_pct.
"""
# Get VaR from flow
var_99, _ = compute_cvar_flow(self.model, alpha=1-confidence)
# Position size such that loss at VaR = max_var_pct of capital
position_size = (self.max_var * capital) / abs(var_99)
return position_size

Key Metrics

Model Performance

  • Negative Log-Likelihood (NLL): Lower is better (measures density fit)
  • Bits per Dimension (BPD): NLL / (dim * log(2))
  • Kolmogorov-Smirnov Test: Compare learned vs empirical distribution
  • QQ Plot: Visual check of distribution fit

Risk Metric Accuracy

  • VaR Backtesting: Count violations (should match confidence level)
  • CVaR Accuracy: Compare predicted vs realized tail losses
  • Kupiec Test: Statistical test for VaR accuracy

Trading Performance

  • Sharpe Ratio: Risk-adjusted returns (target > 1.5)
  • Sortino Ratio: Downside risk-adjusted returns
  • Maximum Drawdown: Largest peak-to-trough decline
  • Calmar Ratio: Return / Max Drawdown

Comparison with Other Methods

AspectGaussianGARCHHistorical SimNormalizing Flow
Fat tailsNoPartialYesYes
SkewnessNoNoYesYes
MultimodalityNoNoLimitedYes
GeneralizationPoorModeratePoorGood
Exact densityYesApproximationNoYes
Synthetic dataEasyModerateLimitedEasy
Computational costLowLowLowMedium

Comparison with Other Generative Models

vs. VAEs

  • VAE: Approximate posterior, ELBO training, reconstruction loss
  • Flow: Exact likelihood, perfect reconstruction, no separate encoder

vs. GANs

  • GAN: No density, mode collapse, adversarial training
  • Flow: Exact density, stable training, no discriminator needed

vs. Diffusion Models

  • Diffusion: Slow sampling, no exact likelihood, strong generation quality
  • Flow: Fast sampling, exact likelihood, simpler architecture
AspectTraditional ModelsFlow Models
LikelihoodApproximate (VAE) or none (GAN)Exact computation
ReconstructionLossyPerfect (invertible)
Anomaly detectionThreshold on featuresPrincipled density estimation
UncertaintyOften missingNatural from density
InterpretabilityBlack boxLatent space structure
Sample qualityMode collapse (GAN)Stable training

Advanced Topics

1. Conditional Normalizing Flows

Condition the flow on external factors (volatility regime, market conditions):

class ConditionalFlow(nn.Module):
def __init__(self, dim, cond_dim, n_layers=8):
# Conditioning network
self.cond_net = nn.Sequential(
nn.Linear(cond_dim, 64),
nn.ReLU(),
nn.Linear(64, 64)
)
# Coupling networks take conditioning as input
self.layers = nn.ModuleList([
ConditionalCouplingLayer(dim, cond_dim=64)
for _ in range(n_layers)
])

2. Multivariate Flows for Portfolio

Model joint distribution of multiple assets:

# Instead of modeling each asset separately
# Model full covariance structure
flow = NormalizingFlow(dim=10) # 10 assets
# Joint samples capture correlations
joint_samples = flow.sample(1000) # [1000, 10]
# Portfolio VaR accounts for diversification
portfolio_returns = joint_samples @ weights
portfolio_var = np.percentile(portfolio_returns, 5)

3. Time-Varying Flows

Update flow parameters as market conditions change:

class AdaptiveFlow:
def __init__(self, base_flow, adaptation_rate=0.01):
self.flow = base_flow
self.rate = adaptation_rate
def update(self, new_data):
"""Online update with new observations"""
loss = -self.flow.log_prob(new_data).mean()
loss.backward()
with torch.no_grad():
for param in self.flow.parameters():
param -= self.rate * param.grad
param.grad.zero_()

Production Considerations

Inference Pipeline:
├── Data Collection (Bybit via CCXT)
│ └── Real-time OHLCV data
├── Return Computation
│ └── Log returns with rolling statistics
├── Model Inference
│ └── Density evaluation / sample generation
├── Risk Calculation
│ └── VaR, CVaR, tail probabilities
├── Signal Generation
│ └── Based on density position
└── Execution
└── Position sizing from risk model
Latency Budget:
├── Data fetch: ~50ms (REST API)
├── Preprocessing: ~1ms
├── Flow inference: ~5ms (GPU)
├── Risk calculation: ~10ms (MC sampling)
├── Signal generation: ~1ms
└── Total: ~70ms

Directory Structure

332_normalizing_flows_finance/
├── README.md # This file
├── README.ru.md # Russian translation
├── readme.simple.md # Beginner-friendly explanation
├── readme.simple.ru.md # Russian beginner version
├── python/ # Python implementation
│ ├── __init__.py
│ ├── flows.py # Normalizing flow models
│ ├── layers.py # Coupling layers
│ ├── risk_metrics.py # VaR, CVaR calculations
│ ├── data_fetcher.py # Bybit data via CCXT
│ ├── training.py # Training loop
│ └── examples/
│ ├── density_estimation.py
│ ├── var_calculation.py
│ └── synthetic_generation.py
└── rust_normalizing_flows/ # Rust implementation
├── Cargo.toml
├── src/
│ ├── lib.rs
│ ├── api/ # Bybit API client
│ ├── flows/ # Flow implementations
│ ├── risk/ # Risk metrics
│ └── utils/ # Utilities
└── examples/
├── fetch_data.rs
├── train_flow.rs
└── compute_var.rs

References

  1. NICE: Non-linear Independent Components Estimation (Dinh et al., 2014)

  2. Variational Inference with Normalizing Flows (Rezende & Mohamed, 2015)

  3. Density Estimation using Real-NVP (Dinh et al., 2016)

  4. Masked Autoregressive Flow for Density Estimation (Papamakarios et al., 2017)

  5. Glow: Generative Flow with Invertible 1x1 Convolutions (Kingma & Dhariwal, 2018)

  6. Neural Ordinary Differential Equations (Chen et al., 2018)

  7. Neural Spline Flows (Durkan et al., 2019)

  8. Normalizing Flows for Probabilistic Modeling and Inference (Papamakarios et al., 2019)

  9. Flow Matching for Generative Modeling (Lipman et al., 2022)

Difficulty Level

Advanced - Requires understanding of:

  • Probability theory and density estimation
  • Change of variables formula
  • Jacobian determinants
  • Deep learning fundamentals
  • Financial risk metrics (VaR, CVaR)

Disclaimer

This chapter is for educational purposes only. Cryptocurrency trading involves substantial risk. The strategies and risk models described here should be thoroughly validated before any real-world application. Past performance does not guarantee future results. Always consult with financial professionals before making investment decisions.