Chapter 144: Physics-Informed Neural Networks for the Heston Stochastic Volatility Model
Chapter 144: Physics-Informed Neural Networks for the Heston Stochastic Volatility Model
Overview
The Black-Scholes model assumes constant volatility — a simplification that collapses under the weight of real market data. Implied volatility surfaces exhibit smiles, skews, and term structures that scream: volatility itself is stochastic. The Heston model (1993) captures this by letting variance follow its own mean-reverting stochastic process. However, solving the Heston PDE across the full (S, v, t) domain is computationally expensive, especially for real-time calibration.
Physics-Informed Neural Networks (PINNs) offer an elegant solution: train a neural network to approximate V(S, v, t) while enforcing the Heston PDE as a soft constraint in the loss function. The result is a mesh-free, differentiable solver that produces option prices, Greeks, and entire volatility surfaces in milliseconds.
Key Insight: By embedding the Heston PDE directly into the neural network’s loss function, we get a solver that respects no-arbitrage constraints, handles complex boundary conditions, and generalizes across the entire (S, v, t) domain — all without a finite-difference grid.
Trading Strategy
Core Strategy: Use a PINN-based Heston solver for real-time option pricing, volatility surface construction, and Greeks computation. Identify mispriced options by comparing PINN-calibrated theoretical prices against market quotes.
Edge Factors:
- Sub-millisecond pricing enables real-time arbitrage detection across strikes and expiries
- Smooth, differentiable Greeks (including cross-Greeks Vanna, Volga) improve hedging
- The Heston model captures volatility clustering and mean-reversion observed in crypto markets
- Calibration to the entire volatility surface simultaneously (not strike-by-strike)
Target Assets: Equity index options (SPX) and cryptocurrency options on Bybit (BTC, ETH)
The Heston Stochastic Volatility Model
Why Constant Volatility Fails
The Black-Scholes model assumes:
dS = μS dt + σS dW₁where σ is constant. This produces flat implied volatility across strikes — contradicted by every options market ever observed. Real markets show:
- Volatility smile: OTM puts and calls are more expensive than ATM
- Volatility skew: OTM puts are more expensive than OTM calls (especially equities)
- Term structure: Short-dated options show steeper smiles than long-dated
- Volatility clustering: Periods of high vol beget high vol (GARCH effects)
The Heston Dynamics
Heston (1993) models the asset price and its variance as a coupled system:
dS(t) = μ S(t) dt + √v(t) S(t) dW₁(t)
dv(t) = κ(θ - v(t)) dt + σ √v(t) dW₂(t)where:
- S(t): Spot price
- v(t): Instantaneous variance (v = σ²_local)
- κ: Mean-reversion speed of variance
- θ: Long-run variance level
- σ: Volatility of variance (vol-of-vol)
- ρ: Correlation between dW₁ and dW₂ (typically ρ < 0 for equities)
- μ: Drift (under risk-neutral measure, μ = r - q)
Correlation structure:
E[dW₁ dW₂] = ρ dtNegative correlation (ρ < 0) creates the leverage effect: when the stock drops, volatility rises — producing the characteristic volatility skew.
The Feller Condition
For the variance process to remain strictly positive:
2κθ > σ² (Feller condition)When this holds, v(t) never touches zero. When violated, the variance can reach zero but is immediately reflected — the CIR process remains well-defined but requires careful numerical treatment.
Typical parameter ranges:
| Parameter | Equities | Crypto (BTC) |
|---|---|---|
| κ | 1-5 | 2-10 |
| θ | 0.02-0.10 | 0.30-1.50 |
| σ (vol-of-vol) | 0.2-0.8 | 0.5-3.0 |
| ρ | -0.8 to -0.3 | -0.5 to 0.2 |
| v₀ | 0.01-0.10 | 0.20-1.00 |
Note: Crypto markets exhibit much higher vol-of-vol and weaker (sometimes positive) correlation.
The Heston PDE
Under the risk-neutral measure, the European option price V(S, v, t) satisfies:
∂V 1 ∂²V ∂²V 1 ∂²V── + ─ v S² ──── + ρσvS ──── + ─ σ²v ────∂t 2 ∂S² ∂S∂v 2 ∂v²
∂V ∂V+ rS ── + κ(θ - v) ── - rV = 0 ∂S ∂vThis is a 2D parabolic PDE in the variables (S, v) with time t. Let us label each term:
| Term | Expression | Meaning |
|---|---|---|
| Time decay | ∂V/∂t | Option value changes with time |
| Spot diffusion | ½vS²∂²V/∂S² | Gamma effect from spot movements |
| Mixed diffusion | ρσvS∂²V/∂S∂v | Cross-effect: spot-variance correlation |
| Variance diffusion | ½σ²v∂²V/∂v² | Volga effect from variance movements |
| Spot drift | rS∂V/∂S | Risk-neutral drift of spot |
| Variance drift | κ(θ-v)∂V/∂v | Mean-reversion pull on variance |
| Discounting | -rV | Risk-free discounting |
Boundary Conditions
The PDE requires boundary conditions on the computational domain [0, S_max] x [0, v_max] x [0, T]:
Terminal condition (at t = T):
V(S, v, T) = max(S - K, 0) (call)V(S, v, T) = max(K - S, 0) (put)At S = 0:
V(0, v, t) = K e^{-r(T-t)} (put: discounted strike)V(0, v, t) = 0 (call)As S → ∞:
V(S, v, t) ≈ S e^{-q(T-t)} (call: approaches forward)V(S, v, t) → 0 (put)At v = 0 (Feller boundary): The PDE degenerates. The boundary condition becomes:
∂V ∂V ∂V── + rS── + κθ── - rV = 0∂t ∂S ∂vThis is the reduced PDE with diffusion terms dropped.
As v → ∞:
V(S, v, t) → S e^{-q(T-t)} (call: dominated by spot)∂V/∂v → 0 (Neumann: value insensitive to further vol increase)PINN Architecture for the Heston PDE
Network Design
The PINN takes a 3D input (S, v, t) and outputs the option price V:
Input Layer: (S, v, t) ∈ ℝ³ │ ┌───────┴───────┐ │ Normalization │ ← Scale inputs to [0, 1] or [-1, 1] │ S̃ = S/S_max │ │ ṽ = v/v_max │ │ t̃ = t/T │ └───────┬───────┘ │ ┌───────┴───────┐ │ Dense(3, 128) │ │ Tanh │ └───────┬───────┘ │ ┌───────┴───────┐ │ Dense(128,128)│ │ Tanh │ × 4-6 hidden layers │ + Residual │ └───────┬───────┘ │ ┌───────┴───────┐ │ Dense(128, 1) │ │ Softplus │ ← Ensures V > 0 └───────┬───────┘ │ Output: V̂(S, v, t)Architecture choices:
- Activation: Tanh (smooth, infinitely differentiable — needed for second-order derivatives)
- Output activation: Softplus ensures non-negative prices
- Residual connections: Stabilize training for deeper networks
- Input normalization: Critical for balancing gradients across S, v, t scales
Input Feature Engineering
For better conditioning, we use log-spot and normalized inputs:
x = log(S / K) # Log-moneyness (centered at ATM)v_norm = v / v_max # Normalized variancetau = (T - t) / T # Normalized time-to-expiryThis naturally handles the wide range of spot prices and focuses the network around the important at-the-money region.
Loss Function Construction
The PINN loss has four components:
L_total = λ_pde · L_pde + λ_bc · L_bc + λ_ic · L_ic + λ_data · L_data1. PDE Residual Loss
Sample N_pde collocation points (S_i, v_i, t_i) in the interior domain and compute the PDE residual using automatic differentiation:
def heston_pde_residual(model, S, v, t, params): """Compute Heston PDE residual at collocation points.""" S.requires_grad_(True) v.requires_grad_(True) t.requires_grad_(True)
V = model(S, v, t)
# First-order derivatives V_t = grad(V, t) V_S = grad(V, S) V_v = grad(V, v)
# Second-order derivatives V_SS = grad(V_S, S) V_vv = grad(V_v, v) V_Sv = grad(V_S, v) # Mixed derivative
kappa, theta, sigma, rho, r = params
residual = (V_t + 0.5 * v * S**2 * V_SS + rho * sigma * v * S * V_Sv + 0.5 * sigma**2 * v * V_vv + r * S * V_S + kappa * (theta - v) * V_v - r * V)
return torch.mean(residual**2)2. Boundary Condition Losses
def boundary_loss(model, K, r, T, params): """Enforce boundary conditions.""" loss = 0.0
# Terminal condition: V(S, v, T) = max(S - K, 0) S_term = torch.linspace(0, S_max, N_bc) v_term = torch.linspace(0, v_max, N_bc) S_grid, v_grid = torch.meshgrid(S_term, v_term) t_T = torch.full_like(S_grid, T) V_pred = model(S_grid, v_grid, t_T) V_exact = torch.maximum(S_grid - K, torch.zeros_like(S_grid)) loss += torch.mean((V_pred - V_exact)**2)
# S = 0 boundary (call) v_bc = torch.linspace(0, v_max, N_bc) t_bc = torch.linspace(0, T, N_bc) v_g, t_g = torch.meshgrid(v_bc, t_bc) S_zero = torch.zeros_like(v_g) V_pred_S0 = model(S_zero, v_g, t_g) loss += torch.mean(V_pred_S0**2) # Call = 0 at S=0
# v = 0 boundary (degenerate PDE) S_bc = torch.linspace(0, S_max, N_bc) t_bc = torch.linspace(0, T, N_bc) S_g, t_g = torch.meshgrid(S_bc, t_bc) v_zero = torch.zeros_like(S_g) loss += feller_boundary_loss(model, S_g, v_zero, t_g, params)
return loss3. Data Loss (Calibration)
When market option prices are available:
def data_loss(model, S_market, v_market, t_market, V_market): """Fit to observed option prices.""" V_pred = model(S_market, v_market, t_market) return torch.mean((V_pred - V_market)**2)4. Handling the Mixed Derivative
The mixed derivative ∂²V/∂S∂v requires careful computation with autograd:
def mixed_derivative(V, S, v): """Compute ∂²V/∂S∂v using two sequential grad calls.""" # First: ∂V/∂S V_S = torch.autograd.grad( V, S, grad_outputs=torch.ones_like(V), create_graph=True, retain_graph=True )[0]
# Second: ∂(∂V/∂S)/∂v V_Sv = torch.autograd.grad( V_S, v, grad_outputs=torch.ones_like(V_S), create_graph=True, retain_graph=True )[0]
return V_SvThe key is create_graph=True — this builds a computation graph for the derivative itself, enabling higher-order differentiation.
Multi-Scale Collocation Strategy
Option prices vary dramatically across the (S, v, t) domain. Near the strike (S ≈ K) and at expiry (t ≈ T), gradients are steep. We use adaptive collocation:
Collocation Point Distribution:┌─────────────────────────────────────────────────┐│ v_max ││ ┌─────────────────────────────────────────┐ ││ │ Sparse points · · · │ ││ │ · · · · · · · │ ││ │ · · · · ··· · · · │ ││ │ · · ·····████····· · · │ ││ │ · · ····█████████████···· · │ ││ │ ·····██████ATM██████████··· │ │ Dense near│ │ ·····███████████████████████···· │ │ strike K│ │ ···██████████████████████··· │ │ and low v│ │ · ····████████████████···· · │ ││ │ · ·····████████····· · │ ││ │ · · · ····· · · · │ ││ └─────────────────────────────────────────┘ ││ S=0 S=K S_max │└─────────────────────────────────────────────────┘def generate_collocation_points(S_max, v_max, T, K, N_total): """Generate multi-scale collocation points.""" N_uniform = N_total // 2 N_atm = N_total // 4 N_low_v = N_total // 4
# Uniform background S_unif = torch.rand(N_uniform) * S_max v_unif = torch.rand(N_uniform) * v_max t_unif = torch.rand(N_uniform) * T
# Dense near ATM (S ≈ K) S_atm = K + 0.2 * K * torch.randn(N_atm) S_atm = torch.clamp(S_atm, 0, S_max) v_atm = torch.rand(N_atm) * v_max t_atm = torch.rand(N_atm) * T
# Dense near low variance S_low = torch.rand(N_low_v) * S_max v_low = torch.rand(N_low_v) * v_max * 0.3 # Focus on low v t_low = torch.rand(N_low_v) * T
S = torch.cat([S_unif, S_atm, S_low]) v = torch.cat([v_unif, v_atm, v_low]) t = torch.cat([t_unif, t_atm, t_low])
return S, v, tSemi-Analytical Heston Pricing
Characteristic Function Approach
The Heston model has a semi-closed-form solution via the characteristic function. The call price is:
C(S, v, t) = S·P₁ - K·e^{-rτ}·P₂where P₁ and P₂ are computed via inverse Fourier transforms:
1 1 ∞ e^{-iφ ln K} f_j(φ)P_j = ─ + ─ ∫ Re[ ───────────────── ] dφ, j = 1, 2 2 π 0 iφThe characteristic function f_j(φ) is:
f_j(φ) = exp(C_j(τ, φ) + D_j(τ, φ)·v₀ + iφ·ln S)with:
C_j = r·iφ·τ + (κθ/σ²)·[(b_j - ρσiφ + d_j)·τ - 2·ln((1 - g_j·e^{d_j·τ})/(1 - g_j))]
D_j = ((b_j - ρσiφ + d_j)/σ²)·((1 - e^{d_j·τ})/(1 - g_j·e^{d_j·τ}))where:
d_j = √((ρσiφ - b_j)² - σ²(2u_j·iφ - φ²))
g_j = (b_j - ρσiφ + d_j)/(b_j - ρσiφ - d_j)
u₁ = 0.5, u₂ = -0.5b₁ = κ + λ - ρσ, b₂ = κ + λHere λ is the market price of variance risk (often set to 0 for simplicity).
Numerical Integration
def heston_call_price(S, K, T, r, v0, kappa, theta, sigma, rho, N=256): """Semi-analytical Heston call price via characteristic function.""" def integrand(phi, j): if j == 1: u, b = 0.5, kappa - rho * sigma else: u, b = -0.5, kappa
d = np.sqrt((rho * sigma * 1j * phi - b)**2 - sigma**2 * (2 * u * 1j * phi - phi**2)) g = (b - rho * sigma * 1j * phi + d) / (b - rho * sigma * 1j * phi - d)
C = r * 1j * phi * T + (kappa * theta / sigma**2) * ( (b - rho * sigma * 1j * phi + d) * T - 2 * np.log((1 - g * np.exp(d * T)) / (1 - g)) ) D = ((b - rho * sigma * 1j * phi + d) / sigma**2) * ( (1 - np.exp(d * T)) / (1 - g * np.exp(d * T)) )
f = np.exp(C + D * v0 + 1j * phi * np.log(S)) return np.real(np.exp(-1j * phi * np.log(K)) * f / (1j * phi))
# Gauss-Laguerre quadrature phi_values = np.linspace(1e-8, 100, N) dphi = phi_values[1] - phi_values[0]
P1 = 0.5 + (1/np.pi) * np.sum([integrand(phi, 1) * dphi for phi in phi_values]) P2 = 0.5 + (1/np.pi) * np.sum([integrand(phi, 2) * dphi for phi in phi_values])
return S * P1 - K * np.exp(-r * T) * P2Greeks via Automatic Differentiation
One of the most powerful features of PINNs: Greeks come for free via autograd.
Standard Greeks
def compute_greeks(model, S, v, t): """Compute all Greeks using autograd.""" S = S.requires_grad_(True) v = v.requires_grad_(True) t = t.requires_grad_(True)
V = model(S, v, t)
# First-order Greeks Delta = torch.autograd.grad(V, S, create_graph=True)[0] Vega = torch.autograd.grad(V, v, create_graph=True)[0] # ∂V/∂v Theta = torch.autograd.grad(V, t, create_graph=True)[0] # ∂V/∂t
# Second-order Greeks Gamma = torch.autograd.grad(Delta, S, create_graph=True)[0] # ∂²V/∂S² Vanna = torch.autograd.grad(Delta, v, create_graph=True)[0] # ∂²V/∂S∂v Volga = torch.autograd.grad(Vega, v, create_graph=True)[0] # ∂²V/∂v²
return { 'delta': Delta, 'gamma': Gamma, 'theta': Theta, 'vega': Vega, 'vanna': Vanna, 'volga': Volga }Why Vanna and Volga Matter
Vanna (∂²V/∂S∂v): Measures how Delta changes with volatility. Critical for:
- Hedging in stochastic volatility environments
- Understanding how spot moves affect vol exposure
- Risk management of vol-spot correlation
Volga (∂²V/∂v²): Measures convexity of option price with respect to variance. Important for:
- Volatility smile dynamics
- Pricing exotic options with vol-of-vol exposure
- Understanding the “smile premium”
Greek Sensitivities in (S, v) Space:
Delta ∂V/∂S Gamma ∂²V/∂S² Vanna ∂²V/∂S∂v┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ ╱─────│ │ ╱╲ │ │ ╱╲ ││ ╱ │ │ ╱ ╲ │ │ ╱ ╲ ││ ╱ │ │ ╱ ╲ │ │╱ ╲ ││ ╱ │ │╱ ╲│ │ ╲ ││─────────────│ │─────────────│ │──────╳──────│└─────────────┘ └─────────────┘ └─────────────┘ 0 K S_max 0 K S_max 0 K S_maxVolatility Smile and Skew Generation
From PINN Prices to Implied Volatility
Given PINN-predicted prices V(S, v, t) for various strikes K, we can invert the Black-Scholes formula to get implied volatility:
def implied_vol_from_pinn(model, S, v0, t, strikes, r, T): """Generate implied volatility smile from PINN prices.""" ivs = [] for K in strikes: V_pinn = model(S, v0, t).item() iv = black_scholes_implied_vol(V_pinn, S, K, T - t, r) ivs.append(iv) return np.array(ivs)Smile Dynamics
The Heston model produces characteristic smile patterns:
Implied Volatility vs Strike (Smile/Skew):
IV(%) 40 │ │╲ ← OTM puts 35 │ ╲ (high IV) │ ╲ 30 │ ╲ │ ╲_____ 25 │ ╲___ │ ╲___ ← ATM 20 │ ╲___ │ ╲___ 15 │ ╲── OTM calls │ (lower IV) 10 │ └──────────┬──────────┬──────────┬──────────────── 0.8K K 1.2K Strike
Impact of Heston Parameters on Smile:─── ρ = -0.7 (steep skew, equity-like)--- ρ = -0.3 (moderate skew)··· ρ = 0.0 (symmetric smile)─·─ ρ = +0.3 (reverse skew, some crypto)Calibration to Market Data
Calibration Objective
Given N market implied volatilities σ_mkt(K_i, T_j), find Heston parameters Θ = (κ, θ, σ, ρ, v₀) that minimize:
min_Θ Σᵢⱼ wᵢⱼ · (σ_model(K_i, T_j; Θ) - σ_mkt(K_i, T_j))²subject to:
κ > 0, θ > 0, σ > 0, -1 < ρ < 1, v₀ > 02κθ > σ² (Feller condition)PINN-Accelerated Calibration
Traditional calibration requires repeatedly solving the Heston PDE (or evaluating the characteristic function) for each parameter guess. With a PINN:
- Train once on a parametric family: V(S, v, t; κ, θ, σ, ρ)
- Calibrate instantly by gradient descent on the parameters through the trained network
def calibrate_heston_pinn(model, market_data, initial_params): """Calibrate Heston parameters using PINN as the forward solver.""" params = torch.tensor(initial_params, requires_grad=True) optimizer = torch.optim.Adam([params], lr=0.01)
for epoch in range(1000): optimizer.zero_grad()
kappa, theta, sigma, rho, v0 = params
# Enforce constraints kappa_c = torch.abs(kappa) theta_c = torch.abs(theta) sigma_c = torch.abs(sigma) rho_c = torch.tanh(rho) # Maps to (-1, 1) v0_c = torch.abs(v0)
total_loss = 0 for K, T, iv_market in market_data: V_model = model(S, v0_c, 0, kappa_c, theta_c, sigma_c, rho_c) iv_model = bs_implied_vol(V_model, S, K, T, r) total_loss += (iv_model - iv_market)**2
total_loss.backward() optimizer.step()
return params.detach()Application to Crypto (Bybit)
Why Heston for Crypto?
Cryptocurrency options markets exhibit extreme volatility characteristics:
- Higher vol-of-vol (σ): BTC variance can spike 10x in hours
- Weaker leverage effect: ρ is closer to 0 or even positive for some assets
- Faster mean-reversion (κ): Vol spikes revert quickly
- Higher baseline variance (θ): BTC annualized vol ≈ 60-80% vs SPX ≈ 15-20%
- Jump components: Heston alone may not capture flash crashes (Bates extension)
Bybit Data Integration
import requests
def fetch_bybit_options(symbol="BTC"): """Fetch option chain from Bybit.""" url = "https://api.bybit.com/v5/market/tickers" params = {"category": "option", "baseCoin": symbol} response = requests.get(url, params=params) data = response.json()
options = [] for item in data['result']['list']: options.append({ 'symbol': item['symbol'], 'bid': float(item['bid1Price']), 'ask': float(item['ask1Price']), 'mark_price': float(item['markPrice']), 'implied_vol': float(item['markIv']), 'delta': float(item['delta']), 'gamma': float(item['gamma']), 'vega': float(item['vega']), 'theta': float(item['theta']), }) return optionsComparison: PINN vs Semi-Analytical Heston
┌─────────────────────┬───────────────┬──────────────┬──────────────┐│ Feature │ Characteristic│ Finite │ PINN ││ │ Function │ Difference │ │├─────────────────────┼───────────────┼──────────────┼──────────────┤│ Single price eval │ ~1 ms │ ~100 ms │ ~0.01 ms ││ Full surface (100²) │ ~10 s │ ~10 s │ ~1 ms ││ Greeks │ Finite diff │ Grid interp │ Autograd ││ Accuracy │ Machine eps │ Grid-dep │ ~1e-4 ││ Training cost │ N/A │ N/A │ Minutes-hours││ Mesh-free │ Yes │ No │ Yes ││ Handles exotics │ Limited │ Yes │ Yes ││ Parameter sensitivity│ Recompute │ Recompute │ Backprop │└─────────────────────┴───────────────┴──────────────┴──────────────┘Training Protocol
Full Training Loop
def train_heston_pinn(model, heston_params, domain, epochs=10000): """Complete training loop for Heston PINN.""" optimizer = torch.optim.Adam(model.parameters(), lr=1e-3) scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=epochs)
kappa, theta, sigma, rho, r = heston_params S_max, v_max, T = domain K = S_max / 2 # ATM strike
# Loss weights (adaptive) lambda_pde = 1.0 lambda_bc = 10.0 lambda_ic = 10.0
history = {'loss': [], 'pde': [], 'bc': [], 'ic': []}
for epoch in range(epochs): optimizer.zero_grad()
# Generate fresh collocation points each epoch S_col, v_col, t_col = generate_collocation_points( S_max, v_max, T, K, N_total=4096 )
# PDE residual loss L_pde = heston_pde_residual(model, S_col, v_col, t_col, heston_params)
# Boundary condition loss L_bc = boundary_loss(model, K, r, T, heston_params)
# Initial/terminal condition loss L_ic = terminal_condition_loss(model, K, S_max, v_max, T)
# Total loss loss = lambda_pde * L_pde + lambda_bc * L_bc + lambda_ic * L_ic
loss.backward() torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) optimizer.step() scheduler.step()
# Adaptive weight adjustment if epoch % 100 == 0: with torch.no_grad(): grad_pde = compute_grad_norm(model, L_pde) grad_bc = compute_grad_norm(model, L_bc) lambda_bc = grad_pde / (grad_bc + 1e-8)
print(f"Epoch {epoch}: Loss={loss:.6f}, " f"PDE={L_pde:.6f}, BC={L_bc:.6f}")
return model, historyTraining Tips
- Start with simple cases: Train on Black-Scholes first (σ = 0 in Heston), then increase vol-of-vol
- Curriculum learning: Begin with long-dated options, add shorter expiries progressively
- Gradient balancing: Monitor gradient norms of each loss component
- Residual-based adaptive refinement: Add collocation points where residual is large
- Transfer learning: Use a trained PINN from one parameter set as initialization for nearby parameters
Backtesting Volatility Trading Strategy
Strategy Logic
def vol_trading_strategy(pinn_model, market_data, heston_params): """ Strategy: Compare PINN-calibrated fair value to market price. If PINN price > market ask → BUY (underpriced vol) If PINN price < market bid → SELL (overpriced vol) Delta-hedge to isolate vol exposure. """ signals = [] for option in market_data: S, K, T, r = option['spot'], option['strike'], option['expiry'], option['rate'] v0 = heston_params['v0']
V_model = pinn_model(S, v0, 0).item()
# Compare to market if V_model > option['ask'] * 1.02: # 2% threshold signals.append({'action': 'BUY', 'option': option, 'edge': V_model - option['ask']}) elif V_model < option['bid'] * 0.98: signals.append({'action': 'SELL', 'option': option, 'edge': option['bid'] - V_model})
return signalsProject Structure
144_pinn_heston_model/├── README.md ← This file├── README.ru.md ← Russian translation├── readme.simple.md ← Simple explanation (English)├── readme.simple.ru.md ← Simple explanation (Russian)├── python/│ ├── __init__.py│ ├── requirements.txt│ ├── heston_pinn.py ← PINN model architecture│ ├── train.py ← Training loop│ ├── data_loader.py ← Market data (stocks + Bybit)│ ├── heston_analytical.py ← Semi-analytical Heston pricing│ ├── calibration.py ← Calibrate Heston to market IV│ ├── greeks.py ← Greeks via autograd│ ├── visualize.py ← Volatility surfaces and plots│ └── backtest.py ← Volatility trading backtest└── rust_pinn_heston/ ├── Cargo.toml ├── src/ │ ├── lib.rs ← Core PINN for Heston │ └── bin/ │ ├── train.rs ← Training binary │ ├── price_options.rs ← Price options via PINN │ ├── fetch_data.rs ← Fetch Bybit data │ └── calibrate.rs ← Calibration binary └── examples/ └── basic_pricing.rs ← Simple pricing exampleRunning the Code
Python
cd 144_pinn_heston_model/pythonpip install -r requirements.txt
# Train the PINNpython train.py --epochs 10000 --lr 1e-3
# Calibrate to market datapython calibration.py --source bybit --symbol BTC
# Compute Greekspython greeks.py --spot 100 --strike 100 --expiry 0.25
# Visualize volatility surfacepython visualize.py --show-smile --show-surface
# Run backtestpython backtest.py --strategy vol_arb --period 30dRust
cd 144_pinn_heston_model/rust_pinn_heston
# Fetch market datacargo run --bin fetch_data -- --symbol BTCUSDT --exchange bybit
# Train the PINNcargo run --bin train -- --epochs 5000 --hidden-dim 128
# Price optionscargo run --bin price_options -- --spot 50000 --strike 50000 --expiry 0.25
# Calibrate to marketcargo run --bin calibrate -- --source bybit --symbol BTCKey Takeaways
-
The Heston model captures stochastic volatility with five parameters (κ, θ, σ, ρ, v₀) and produces realistic volatility smiles/skews
-
PINNs solve the Heston PDE without a mesh, producing a differentiable approximation V(S, v, t) that respects the physics
-
The mixed derivative ∂²V/∂S∂v is the key technical challenge — handled naturally by automatic differentiation
-
Greeks come for free via backpropagation, including challenging cross-Greeks (Vanna, Volga)
-
Crypto markets demand higher vol-of-vol parameters and faster mean-reversion speeds
-
Calibration can be accelerated by differentiating through the PINN with respect to model parameters
-
Real-time applications become feasible: once trained, the PINN evaluates in microseconds
References
-
Heston, S. L. (1993). “A Closed-Form Solution for Options with Stochastic Volatility with Applications to Bond and Currency Options.” The Review of Financial Studies, 6(2), 327-343.
-
Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). “Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations.” Journal of Computational Physics, 378, 686-707.
-
Bayer, C., & Stemper, B. (2018). “Deep Calibration of Rough Stochastic Volatility Models.” arXiv:1810.03399.
-
Salvador, B., Oosterlee, C. W., & van der Meer, R. (2020). “Financial Option Valuation by Unsupervised Learning with Artificial Neural Networks.” Mathematics, 9(1), 46.
-
Alfonsi, A. (2015). Affine Diffusions and Related Processes: Simulation, Theory and Applications. Springer.
-
Gatheral, J. (2006). The Volatility Surface: A Practitioner’s Guide. Wiley.