Chapter 332: Normalizing Flows for Finance

Overview

Normalizing Flows are a class of deep generative models that learn complex probability distributions by transforming a simple base distribution (like Gaussian) through a sequence of invertible, differentiable transformations. Unlike other generative models (VAEs, GANs), normalizing flows provide exact likelihood computation, making them ideal for financial applications where accurate density estimation is crucial for risk management.

Why Normalizing Flows for Finance?

The Problem with Traditional Approaches

Financial returns are notoriously non-Gaussian:

Fat tails: Extreme events happen more frequently than normal distributions predict
Skewness: Returns are often asymmetric (larger drops than gains)
Time-varying volatility: The distribution shape changes over time
Multimodality: Multiple market regimes create complex distributions

Traditional risk models (VaR, CVaR) assume Gaussian returns, leading to:

Underestimation of tail risk
Poor hedging decisions
Unexpected losses during market stress

Normalizing Flow Solution

Normalizing Flows learn the true distribution of returns:

Traditional: Assume X ~ N(μ, σ²) → Underestimate tail risk

Normalizing Flow: Learn p(X) directly → Accurate density for any shape
  z ~ N(0, I)     [Simple base distribution]
  x = f(z)        [Invertible transformation]
  p(x) = p(z)|det(∂f⁻¹/∂x)|  [Exact likelihood via change of variables]

Mathematical Foundation

Change of Variables Formula

The core principle of normalizing flows is the change of variables formula:

Given:

Base distribution: z ~ p_z(z) (typically standard normal)
Invertible transformation: x = f(z), so z = f⁻¹(x)
Target distribution: p_x(x)

The density transformation is:

p_x(x) = p_z(f⁻¹(x)) |det(J_{f⁻¹}(x))|

where J_{f⁻¹}(x) = ∂f⁻¹(x)/∂x is the Jacobian matrix

For a sequence of K transformations:

z₀ → f₁ → z₁ → f₂ → z₂ → ... → f_K → x

log p(x) = log p(z₀) - Σᵢ log|det(J_{fᵢ})|

Why Jacobian Matters

The Jacobian determinant accounts for how the transformation stretches or compresses space:

┌────────────────────────────────────────────────────────────┐
│                     JACOBIAN INTUITION                      │
├────────────────────────────────────────────────────────────┤
│                                                             │
│   Base Distribution (z)       Target Distribution (x)      │
│                                                             │
│      ┌─────────┐                   ┌──────────────┐        │
│      │  ***    │      f(z)        │    ***       │        │
│      │ *****   │  ──────────►     │ ***    **    │        │
│      │  ***    │                   │  ****  ***   │        │
│      └─────────┘                   └──────────────┘        │
│                                                             │
│   Gaussian blob               Complex distribution         │
│   (easy to sample)            (hard to model directly)     │
│                                                             │
│   Jacobian = How much volume changes at each point         │
│   |det(J)| > 1 → Space expands → Density decreases        │
│   |det(J)| < 1 → Space contracts → Density increases      │
│                                                             │
└────────────────────────────────────────────────────────────┘

Types of Normalizing Flows

1. Affine Coupling Flows (RealNVP)

Key Idea: Split the input and apply simple transformations that have tractable Jacobians.

┌─────────────────────────────────────────────────────────────┐
│                    AFFINE COUPLING LAYER                    │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Input: x = [x₁, x₂]  (split into two parts)              │
│                                                             │
│   Transformation:                                           │
│     y₁ = x₁                     (unchanged)                 │
│     y₂ = x₂ ⊙ exp(s(x₁)) + t(x₁)  (affine transform)       │
│                                                             │
│   where s() and t() are neural networks                     │
│                                                             │
│   Jacobian is TRIANGULAR → det = ∏ exp(s(x₁)) = exp(Σs)   │
│   Very efficient! O(D) instead of O(D³)                     │
│                                                             │
│        ┌──────────┐                                         │
│   x₁ ──┤ Identity ├──────────────────────────────► y₁      │
│        └──────────┘                                         │
│             │                                               │
│             ▼                                               │
│        ┌────────┐        ┌─────────────────────┐           │
│   x₂ ──┤ Neural ├──s,t──►│ y₂ = x₂·exp(s) + t │──► y₂     │
│        │Networks│        └─────────────────────┘           │
│        └────────┘                                           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

2. Autoregressive Flows (MAF/IAF)

Masked Autoregressive Flow (MAF): Each dimension depends on previous dimensions.

x₁ = z₁ · σ₁ + μ₁
x₂ = z₂ · σ₂(x₁) + μ₂(x₁)
x₃ = z₃ · σ₃(x₁,x₂) + μ₃(x₁,x₂)
...

Jacobian is LOWER TRIANGULAR → det = ∏ σᵢ

Inverse Autoregressive Flow (IAF): Opposite direction for faster sampling.

MAF: Fast density, slow sampling
IAF: Fast sampling, slow density

┌────────────────────────────────────────┐
│              MAF vs IAF                │
├────────────────────────────────────────┤
│ Operation     │  MAF    │   IAF       │
├───────────────┼─────────┼─────────────┤
│ log p(x)      │ O(1)    │ O(D)        │
│ Sampling      │ O(D)    │ O(1)        │
│ Training      │ Fast    │ Slow        │
│ Generation    │ Slow    │ Fast        │
└────────────────────────────────────────┘

3. Continuous Normalizing Flows (Neural ODE)

Key Idea: Instead of discrete transformations, define a continuous flow via an ODE:

dz/dt = f(z, t; θ)

Log-likelihood change:
d log p(z)/dt = -tr(∂f/∂z)

Solved via numerical integration (adjoint method)

Model Architecture

┌─────────────────────────────────────────────────────────────────┐
│              NORMALIZING FLOW FOR FINANCIAL RETURNS             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  INPUT LAYER                                                     │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ Financial Returns Data:                                   │   │
│  │   - Daily/hourly returns                                  │   │
│  │   - Multi-asset returns (portfolio)                       │   │
│  │   - Conditional features (volatility, volume, etc.)       │   │
│  └──────────────────────────────────────────────────────────┘   │
│                              ↓                                   │
│  PREPROCESSING                                                   │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ Standardization: (x - μ) / σ                              │   │
│  │ Winsorization: clip extreme values                        │   │
│  │ Optional: Add context features for conditional flow       │   │
│  └──────────────────────────────────────────────────────────┘   │
│                              ↓                                   │
│  FLOW BLOCKS (×N)                                                │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ ┌────────────────────────────────────────────────────┐   │   │
│  │ │ Affine Coupling Layer 1                             │   │   │
│  │ │   x₁ unchanged, x₂ transformed via s(x₁), t(x₁)    │   │   │
│  │ └────────────────────────────────────────────────────┘   │   │
│  │                         ↓                                │   │
│  │ ┌────────────────────────────────────────────────────┐   │   │
│  │ │ Permutation / Shuffling                             │   │   │
│  │ │   Ensure all dimensions get transformed             │   │   │
│  │ └────────────────────────────────────────────────────┘   │   │
│  │                         ↓                                │   │
│  │ ┌────────────────────────────────────────────────────┐   │   │
│  │ │ Affine Coupling Layer 2                             │   │   │
│  │ │   Opposite split (x₂ unchanged, x₁ transformed)    │   │   │
│  │ └────────────────────────────────────────────────────┘   │   │
│  │                         ↓                                │   │
│  │ ┌────────────────────────────────────────────────────┐   │   │
│  │ │ Batch Normalization (optional)                      │   │   │
│  │ │   Stabilize training                                │   │   │
│  │ └────────────────────────────────────────────────────┘   │   │
│  └──────────────────────────────────────────────────────────┘   │
│                              ↓                                   │
│  BASE DISTRIBUTION                                               │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ Standard Gaussian: z ~ N(0, I)                            │   │
│  │ Or Student-t for heavier tails: z ~ t(ν, 0, I)           │   │
│  └──────────────────────────────────────────────────────────┘   │
│                              ↓                                   │
│  OUTPUT                                                          │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ log p(x) = log p(z) + Σ log|det(J_k)|                    │   │
│  │ Samples: z ~ p(z) → x = f(z)                             │   │
│  │ Density: x → z = f⁻¹(x) → p(x)                           │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Financial Applications

1. Density Estimation for Returns

def estimate_return_density(model, returns):
    """
    Estimate the probability density of returns.

    Args:
        model: Trained normalizing flow
        returns: Array of return values

    Returns:
        log_prob: Log probability density at each point
    """
    # Transform returns to latent space
    z, log_det = model.inverse(returns)

    # Compute base distribution log probability
    log_pz = -0.5 * (z**2 + np.log(2*np.pi)).sum(dim=-1)

    # Total log probability via change of variables
    log_prob = log_pz + log_det

    return log_prob

2. Value at Risk (VaR) with Learned Densities

Traditional VaR assumes Gaussian returns. With normalizing flows:

┌─────────────────────────────────────────────────────────────┐
│                    VaR COMPARISON                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Gaussian VaR (underestimates tail risk):                  │
│   VaR_α = μ + σ · Φ⁻¹(α)                                   │
│                                                             │
│   Normalizing Flow VaR (accurate):                          │
│   VaR_α = quantile from learned distribution p(x)           │
│   Found via: ∫_{-∞}^{VaR} p(x)dx = α                       │
│                                                             │
│   Monte Carlo approach:                                     │
│   1. Sample N points from flow: xᵢ ~ p(x)                   │
│   2. Sort samples                                           │
│   3. VaR_α = x_{⌊αN⌋}                                       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

3. Conditional Value at Risk (CVaR / Expected Shortfall)

def compute_cvar(model, alpha=0.05, n_samples=100000):
    """
    Compute CVaR using normalizing flow samples.

    CVaR_α = E[X | X ≤ VaR_α]
    """
    # Generate samples from learned distribution
    z = torch.randn(n_samples, model.dim)
    samples = model.forward(z)

    # Find VaR threshold
    var = np.percentile(samples, alpha * 100)

    # Average of samples below VaR
    cvar = samples[samples <= var].mean()

    return var, cvar

4. Synthetic Data Generation

Generate realistic return scenarios for:

Stress testing
Backtesting on more data
Training other models
Monte Carlo simulations

def generate_synthetic_returns(model, n_scenarios, conditioning=None):
    """
    Generate synthetic return scenarios from learned distribution.
    """
    # Sample from base distribution
    z = torch.randn(n_scenarios, model.dim)

    # Transform through flow
    if conditioning is not None:
        # Conditional generation (e.g., high volatility regime)
        synthetic_returns = model.forward(z, conditioning)
    else:
        synthetic_returns = model.forward(z)

    return synthetic_returns

5. Tail Risk Modeling

┌─────────────────────────────────────────────────────────────┐
│               TAIL RISK COMPARISON                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Probability of -10% daily return:                         │
│                                                             │
│   Gaussian (σ=2%):  P(X < -10%) = 3 × 10⁻⁷  (very rare)   │
│   Historical:       P(X < -10%) ≈ 0.1%      (happens!)     │
│   Normalizing Flow: P(X < -10%) ≈ 0.08%     (accurate!)    │
│                                                             │
│   The flow learns the TRUE tail behavior!                   │
│                                                             │
│                    Gaussian                                  │
│          │    ***       vs        Learned Flow              │
│          │  *******                   ***                   │
│     P(x) │ *********                ******                  │
│          │***********              ********                 │
│          │           *            **********                │
│          └──────────────┘         ──────────┘              │
│             thin tails              fat tails               │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Additional Flow Architectures

NICE (Non-linear Independent Components Estimation)

The simplest flow architecture using additive coupling:

# Additive coupling layer
def nice_forward(x, mask):
    x1, x2 = x * mask, x * (1 - mask)
    y1 = x1
    y2 = x2 + neural_net(x1)  # Additive transformation
    return y1 + y2

# Inverse is trivial!
def nice_inverse(y, mask):
    y1, y2 = y * mask, y * (1 - mask)
    x1 = y1
    x2 = y2 - neural_net(y1)  # Simply subtract
    return x1 + x2

Glow (Generative Flow with Invertible 1x1 Convolutions)

A more expressive architecture combining three components:

Glow Block:
├── ActNorm: Learned activation normalization
├── 1x1 Convolution: Learnable permutation
└── Affine Coupling: RealNVP-style transformation

Multi-scale architecture:
Level 1: [Flow Block x K] → Split
Level 2: [Flow Block x K] → Split
Level L: [Flow Block x K] → Final z

ActNorm (Activation Normalization)

Data-dependent initialization that stabilizes training:

class ActNorm(nn.Module):
    """Activation normalization with data-dependent initialization"""

    def __init__(self, dim):
        super().__init__()
        self.dim = dim
        self.scale = nn.Parameter(torch.ones(1, dim))
        self.bias = nn.Parameter(torch.zeros(1, dim))
        self.initialized = False

    def initialize(self, x):
        """Data-dependent initialization"""
        with torch.no_grad():
            mean = x.mean(dim=0, keepdim=True)
            std = x.std(dim=0, keepdim=True)
            self.bias.data = -mean
            self.scale.data = 1.0 / (std + 1e-6)
            self.initialized = True

    def forward(self, x):
        if not self.initialized:
            self.initialize(x)
        y = (x + self.bias) * self.scale
        log_det = torch.log(torch.abs(self.scale)).sum() * x.shape[0]
        return y, log_det

    def inverse(self, y):
        x = y / self.scale - self.bias
        return x

Flow Matching (Modern Approach)

A newer, simpler training paradigm for continuous normalizing flows:

class FlowMatchingTrader:
    """Modern flow matching approach for trading signals"""

    def __init__(self, vector_field_net):
        self.v_net = vector_field_net  # Neural network for vector field

    def flow_matching_loss(self, x0, x1):
        """
        Flow matching training objective
        x0: noise samples (base distribution)
        x1: data samples (market features)
        """
        # Random time
        t = torch.rand(x0.shape[0], 1)

        # Interpolate between noise and data
        xt = (1 - t) * x0 + t * x1

        # Target velocity (optimal transport)
        ut = x1 - x0

        # Predicted velocity
        vt = self.v_net(xt, t)

        # MSE loss
        loss = ((vt - ut) ** 2).mean()
        return loss

    def sample(self, num_samples, steps=100):
        """Generate samples using ODE integration"""
        x = torch.randn(num_samples, self.dim)
        dt = 1.0 / steps
        for t in torch.linspace(0, 1, steps):
            v = self.v_net(x, t.expand(num_samples, 1))
            x = x + v * dt
        return x

Trading Applications: Order Flow and Microstructure

Order Flow Prediction

class OrderFlowPredictor:
    """Predict order flow using conditional flow model"""

    def __init__(self, flow_model, context_encoder):
        self.flow = flow_model
        self.encoder = context_encoder

    def predict(self, market_context, num_samples=1000):
        # Encode market context
        context = self.encoder(market_context)

        # Sample from latent space
        z = torch.randn(num_samples, self.flow.latent_dim)

        # Generate order flow predictions
        predictions = self.flow.inverse(z, context)

        return {
            'expected_flow': predictions.mean(dim=0),
            'uncertainty': predictions.std(dim=0),
            'samples': predictions
        }

Market Microstructure Modeling

class MicrostructureFlow:
    """Model order book dynamics with normalizing flows"""

    def compute_likelihood(self, order_book_state):
        """Compute log-likelihood of order book configuration"""
        z, log_det = self.flow.forward(order_book_state)
        log_pz = self.base_dist.log_prob(z).sum(dim=-1)
        return log_pz + log_det

    def detect_anomaly(self, order_book_state, threshold=-10.0):
        """Detect unusual order book configurations"""
        log_px = self.compute_likelihood(order_book_state)
        return log_px < threshold

    def simulate_book_evolution(self, initial_state, steps=100):
        """Simulate future order book states"""
        states = [initial_state]
        for _ in range(steps):
            z, _ = self.flow.forward(states[-1])
            z_next = z + 0.01 * torch.randn_like(z)
            next_state = self.flow.inverse(z_next)
            states.append(next_state)
        return torch.stack(states)

Latent Space Regime Detection

class RegimeDetector:
    """Detect market regimes using flow latent space"""

    def __init__(self, flow_model, n_regimes=4):
        self.flow = flow_model
        self.n_regimes = n_regimes
        self.clusterer = GaussianMixture(n_components=n_regimes)

    def fit_regimes(self, historical_data):
        """Fit regime clusters on latent representations"""
        z_latent, _ = self.flow.forward(historical_data)
        self.clusterer.fit(z_latent.detach().numpy())
        self.regime_labels = self._analyze_regimes(historical_data, z_latent)

    def detect_current_regime(self, current_data):
        """Identify current market regime"""
        z, _ = self.flow.forward(current_data)
        regime = self.clusterer.predict(z.detach().numpy())
        probs = self.clusterer.predict_proba(z.detach().numpy())
        return {
            'regime': regime[0],
            'label': self.regime_labels[regime[0]],
            'confidence': probs.max(),
            'regime_probs': dict(zip(self.regime_labels, probs[0]))
        }

Stress Testing with Flows

class FlowStressTester:
    """Generate stress scenarios from low-likelihood regions"""

    def __init__(self, flow_model):
        self.flow = flow_model

    def stress_test(self, portfolio, scenario_likelihood_threshold=-20.0):
        # Find low-likelihood regions in latent space
        z_extreme = torch.randn(1000, self.flow.latent_dim) * 3  # Far from mean
        extreme_scenarios = self.flow.inverse(z_extreme)
        log_probs = self.flow.log_prob(extreme_scenarios)

        # Select extreme but plausible scenarios
        mask = log_probs > scenario_likelihood_threshold
        stress_scenarios = extreme_scenarios[mask]

        impacts = [(scenario * portfolio.weights).sum().item()
                   for scenario in stress_scenarios]

        return {
            'scenarios': stress_scenarios,
            'impacts': impacts,
            'worst_case': min(impacts),
            'expected_shortfall': np.mean(sorted(impacts)[:int(len(impacts)*0.05)])
        }

Microstructure Data Requirements

For high-frequency trading applications, flow models benefit from rich microstructure features:

Market Data for Flow Models:
├── High-frequency data (tick-level preferred)
│   └── Order flow, trades, quotes
├── Order book snapshots
│   └── Multi-level bid/ask with sizes
├── Volume data
│   └── Buy/sell decomposition
└── Derived features
    ├── Order flow imbalance (OFI)
    ├── Volume-weighted price deviation
    ├── Spread dynamics
    ├── Depth imbalance
    ├── VPIN (Volume-synchronized PIN)
    └── Kyle's lambda estimates

Implementation Details

Network Architecture for Scale/Translation Networks

class CouplingNetwork(nn.Module):
    """
    Neural network for computing scale and translation in coupling layers.
    """
    def __init__(self, input_dim, hidden_dim=256, n_layers=3):
        super().__init__()

        layers = [nn.Linear(input_dim, hidden_dim), nn.ReLU()]
        for _ in range(n_layers - 1):
            layers.extend([nn.Linear(hidden_dim, hidden_dim), nn.ReLU()])

        # Output scale and translation
        self.net = nn.Sequential(*layers)
        self.scale_net = nn.Linear(hidden_dim, input_dim)
        self.translation_net = nn.Linear(hidden_dim, input_dim)

        # Initialize to identity transform
        nn.init.zeros_(self.scale_net.weight)
        nn.init.zeros_(self.scale_net.bias)
        nn.init.zeros_(self.translation_net.weight)
        nn.init.zeros_(self.translation_net.bias)

    def forward(self, x):
        h = self.net(x)
        s = self.scale_net(h)
        t = self.translation_net(h)
        return s, t

Affine Coupling Layer

class AffineCouplingLayer(nn.Module):
    """
    Affine coupling layer as used in RealNVP.
    """
    def __init__(self, dim, mask):
        super().__init__()
        self.dim = dim
        self.register_buffer('mask', mask)
        self.coupling_net = CouplingNetwork(dim // 2, hidden_dim=256)

    def forward(self, x):
        """Forward pass: data space -> latent space"""
        x_masked = x * self.mask
        s, t = self.coupling_net(x_masked)

        # Apply transformation to unmasked part
        y = x_masked + (1 - self.mask) * (x * torch.exp(s) + t)

        # Log determinant of Jacobian
        log_det = (s * (1 - self.mask)).sum(dim=-1)

        return y, log_det

    def inverse(self, y):
        """Inverse pass: latent space -> data space"""
        y_masked = y * self.mask
        s, t = self.coupling_net(y_masked)

        # Inverse transformation
        x = y_masked + (1 - self.mask) * (y - t) * torch.exp(-s)

        # Log determinant (negative for inverse)
        log_det = -(s * (1 - self.mask)).sum(dim=-1)

        return x, log_det

Complete Normalizing Flow Model

class NormalizingFlow(nn.Module):
    """
    Complete normalizing flow for density estimation.
    """
    def __init__(self, dim, n_layers=8, hidden_dim=256):
        super().__init__()
        self.dim = dim

        # Create alternating masks
        masks = []
        for i in range(n_layers):
            mask = torch.zeros(dim)
            mask[:dim//2] = 1.0 if i % 2 == 0 else 0.0
            mask[dim//2:] = 0.0 if i % 2 == 0 else 1.0
            masks.append(mask)

        # Stack coupling layers
        self.layers = nn.ModuleList([
            AffineCouplingLayer(dim, masks[i])
            for i in range(n_layers)
        ])

        # Base distribution
        self.register_buffer('base_mean', torch.zeros(dim))
        self.register_buffer('base_std', torch.ones(dim))

    def forward(self, z):
        """Transform from latent space to data space"""
        x = z
        for layer in self.layers:
            x, _ = layer.inverse(x)
        return x

    def inverse(self, x):
        """Transform from data space to latent space"""
        z = x
        total_log_det = 0
        for layer in reversed(self.layers):
            z, log_det = layer(z)
            total_log_det += log_det
        return z, total_log_det

    def log_prob(self, x):
        """Compute log probability of data"""
        z, log_det = self.inverse(x)
        log_pz = -0.5 * (z**2 + np.log(2*np.pi)).sum(dim=-1)
        return log_pz + log_det

    def sample(self, n_samples):
        """Generate samples from learned distribution"""
        z = torch.randn(n_samples, self.dim)
        return self.forward(z)

Training Configuration

model:
  dim: 1  # Univariate returns (or portfolio dimension)
  n_layers: 8
  hidden_dim: 256
  activation: "relu"
  use_batch_norm: true

training:
  batch_size: 256
  learning_rate: 0.0001
  weight_decay: 0.0001
  max_epochs: 500
  early_stopping_patience: 20
  gradient_clip: 1.0

data:
  train_split: 0.7
  val_split: 0.15
  test_split: 0.15
  lookback_window: 252  # 1 year of daily data
  returns_type: "log"  # log returns
  standardize: true

Risk Metrics with Normalizing Flows

VaR Calculation

def compute_var_flow(model, alpha_levels=[0.01, 0.05, 0.10], n_samples=100000):
    """
    Compute Value at Risk at multiple confidence levels.
    """
    # Generate samples
    samples = model.sample(n_samples).detach().numpy().flatten()

    var_results = {}
    for alpha in alpha_levels:
        var = np.percentile(samples, alpha * 100)
        var_results[f'VaR_{int((1-alpha)*100)}'] = var

    return var_results

CVaR/Expected Shortfall

def compute_cvar_flow(model, alpha=0.05, n_samples=100000):
    """
    Compute Conditional VaR (Expected Shortfall).
    """
    samples = model.sample(n_samples).detach().numpy().flatten()
    var = np.percentile(samples, alpha * 100)
    cvar = samples[samples <= var].mean()
    return var, cvar

Tail Probability

def compute_tail_probability(model, threshold, n_samples=100000):
    """
    Compute probability of returns below threshold.
    P(X < threshold)
    """
    samples = model.sample(n_samples).detach().numpy().flatten()
    tail_prob = (samples < threshold).mean()
    return tail_prob

Trading Strategy Integration

Signal Generation Based on Density

def generate_density_signals(model, current_return, historical_returns):
    """
    Generate trading signals based on return density position.

    If current return is in low-probability region, expect mean reversion.
    """
    # Compute log probability of current return
    log_prob = model.log_prob(torch.tensor([[current_return]])).item()

    # Compute percentile of current return
    samples = model.sample(100000).numpy().flatten()
    percentile = (samples < current_return).mean()

    # Signal logic
    if percentile < 0.05:  # Extreme low return
        return Signal("LONG", confidence=1 - percentile,
                      reason="Extreme negative return, expect bounce")
    elif percentile > 0.95:  # Extreme high return
        return Signal("SHORT", confidence=percentile,
                      reason="Extreme positive return, expect pullback")
    else:
        return Signal("NEUTRAL", confidence=0.5)

Portfolio Risk Management

class FlowBasedRiskManager:
    """
    Risk manager using normalizing flow for position sizing.
    """
    def __init__(self, flow_model, max_var_pct=0.02):
        self.model = flow_model
        self.max_var = max_var_pct

    def compute_position_size(self, capital, confidence=0.99):
        """
        Size position so that 99% VaR doesn't exceed max_var_pct.
        """
        # Get VaR from flow
        var_99, _ = compute_cvar_flow(self.model, alpha=1-confidence)

        # Position size such that loss at VaR = max_var_pct of capital
        position_size = (self.max_var * capital) / abs(var_99)

        return position_size

Key Metrics

Model Performance

Negative Log-Likelihood (NLL): Lower is better (measures density fit)
Bits per Dimension (BPD): NLL / (dim * log(2))
Kolmogorov-Smirnov Test: Compare learned vs empirical distribution
QQ Plot: Visual check of distribution fit

Risk Metric Accuracy

VaR Backtesting: Count violations (should match confidence level)
CVaR Accuracy: Compare predicted vs realized tail losses
Kupiec Test: Statistical test for VaR accuracy

Trading Performance

Sharpe Ratio: Risk-adjusted returns (target > 1.5)
Sortino Ratio: Downside risk-adjusted returns
Maximum Drawdown: Largest peak-to-trough decline
Calmar Ratio: Return / Max Drawdown

Comparison with Other Methods

Aspect	Gaussian	GARCH	Historical Sim	Normalizing Flow
Fat tails	No	Partial	Yes	Yes
Skewness	No	No	Yes	Yes
Multimodality	No	No	Limited	Yes
Generalization	Poor	Moderate	Poor	Good
Exact density	Yes	Approximation	No	Yes
Synthetic data	Easy	Moderate	Limited	Easy
Computational cost	Low	Low	Low	Medium

Comparison with Other Generative Models

vs. VAEs

VAE: Approximate posterior, ELBO training, reconstruction loss
Flow: Exact likelihood, perfect reconstruction, no separate encoder

vs. GANs

GAN: No density, mode collapse, adversarial training
Flow: Exact density, stable training, no discriminator needed

vs. Diffusion Models

Diffusion: Slow sampling, no exact likelihood, strong generation quality
Flow: Fast sampling, exact likelihood, simpler architecture

Aspect	Traditional Models	Flow Models
Likelihood	Approximate (VAE) or none (GAN)	Exact computation
Reconstruction	Lossy	Perfect (invertible)
Anomaly detection	Threshold on features	Principled density estimation
Uncertainty	Often missing	Natural from density
Interpretability	Black box	Latent space structure
Sample quality	Mode collapse (GAN)	Stable training

Advanced Topics

1. Conditional Normalizing Flows

Condition the flow on external factors (volatility regime, market conditions):

class ConditionalFlow(nn.Module):
    def __init__(self, dim, cond_dim, n_layers=8):
        # Conditioning network
        self.cond_net = nn.Sequential(
            nn.Linear(cond_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 64)
        )

        # Coupling networks take conditioning as input
        self.layers = nn.ModuleList([
            ConditionalCouplingLayer(dim, cond_dim=64)
            for _ in range(n_layers)
        ])

2. Multivariate Flows for Portfolio

Model joint distribution of multiple assets:

# Instead of modeling each asset separately
# Model full covariance structure
flow = NormalizingFlow(dim=10)  # 10 assets

# Joint samples capture correlations
joint_samples = flow.sample(1000)  # [1000, 10]

# Portfolio VaR accounts for diversification
portfolio_returns = joint_samples @ weights
portfolio_var = np.percentile(portfolio_returns, 5)

3. Time-Varying Flows

Update flow parameters as market conditions change:

class AdaptiveFlow:
    def __init__(self, base_flow, adaptation_rate=0.01):
        self.flow = base_flow
        self.rate = adaptation_rate

    def update(self, new_data):
        """Online update with new observations"""
        loss = -self.flow.log_prob(new_data).mean()
        loss.backward()

        with torch.no_grad():
            for param in self.flow.parameters():
                param -= self.rate * param.grad
                param.grad.zero_()

Production Considerations

Inference Pipeline:
├── Data Collection (Bybit via CCXT)
│   └── Real-time OHLCV data
├── Return Computation
│   └── Log returns with rolling statistics
├── Model Inference
│   └── Density evaluation / sample generation
├── Risk Calculation
│   └── VaR, CVaR, tail probabilities
├── Signal Generation
│   └── Based on density position
└── Execution
    └── Position sizing from risk model

Latency Budget:
├── Data fetch: ~50ms (REST API)
├── Preprocessing: ~1ms
├── Flow inference: ~5ms (GPU)
├── Risk calculation: ~10ms (MC sampling)
├── Signal generation: ~1ms
└── Total: ~70ms

Directory Structure

332_normalizing_flows_finance/
├── README.md                    # This file
├── README.ru.md                 # Russian translation
├── readme.simple.md             # Beginner-friendly explanation
├── readme.simple.ru.md          # Russian beginner version
├── python/                      # Python implementation
│   ├── __init__.py
│   ├── flows.py                 # Normalizing flow models
│   ├── layers.py                # Coupling layers
│   ├── risk_metrics.py          # VaR, CVaR calculations
│   ├── data_fetcher.py          # Bybit data via CCXT
│   ├── training.py              # Training loop
│   └── examples/
│       ├── density_estimation.py
│       ├── var_calculation.py
│       └── synthetic_generation.py
└── rust_normalizing_flows/      # Rust implementation
    ├── Cargo.toml
    ├── src/
    │   ├── lib.rs
    │   ├── api/                 # Bybit API client
    │   ├── flows/               # Flow implementations
    │   ├── risk/                # Risk metrics
    │   └── utils/               # Utilities
    └── examples/
        ├── fetch_data.rs
        ├── train_flow.rs
        └── compute_var.rs

References

NICE: Non-linear Independent Components Estimation (Dinh et al., 2014)
- https://arxiv.org/abs/1410.8516
Variational Inference with Normalizing Flows (Rezende & Mohamed, 2015)
- https://arxiv.org/abs/1505.05770
Density Estimation using Real-NVP (Dinh et al., 2016)
- https://arxiv.org/abs/1605.08803
Masked Autoregressive Flow for Density Estimation (Papamakarios et al., 2017)
- https://arxiv.org/abs/1705.07057
Glow: Generative Flow with Invertible 1x1 Convolutions (Kingma & Dhariwal, 2018)
- https://arxiv.org/abs/1807.03039
Neural Ordinary Differential Equations (Chen et al., 2018)
- https://arxiv.org/abs/1806.07366
Neural Spline Flows (Durkan et al., 2019)
- https://arxiv.org/abs/1906.04032
Normalizing Flows for Probabilistic Modeling and Inference (Papamakarios et al., 2019)
- https://arxiv.org/abs/1912.02762
Flow Matching for Generative Modeling (Lipman et al., 2022)
- https://arxiv.org/abs/2210.02747

Difficulty Level

Advanced - Requires understanding of:

Probability theory and density estimation
Change of variables formula
Jacobian determinants
Deep learning fundamentals
Financial risk metrics (VaR, CVaR)

Disclaimer

This chapter is for educational purposes only. Cryptocurrency trading involves substantial risk. The strategies and risk models described here should be thoroughly validated before any real-world application. Past performance does not guarantee future results. Always consult with financial professionals before making investment decisions.