Chapter 351: WaveNet for Trading
Chapter 351: WaveNet for Trading
Overview
WaveNet is a deep generative model originally developed by DeepMind for raw audio generation. Its key innovation - dilated causal convolutions - makes it exceptionally well-suited for time series prediction in financial markets. This chapter explores how to adapt WaveNet architecture for cryptocurrency trading using data from the Bybit exchange.
Table of Contents
- Introduction
- WaveNet Architecture
- Dilated Causal Convolutions
- Adapting WaveNet for Trading
- Implementation
- Trading Strategy
- Backtesting Results
- References
Introduction
Why WaveNet for Trading?
Traditional recurrent neural networks (RNNs, LSTMs, GRUs) process sequences step-by-step, which can be:
- Computationally slow
- Difficult to parallelize
- Prone to vanishing gradients for long sequences
WaveNet addresses these issues with dilated causal convolutions, which:
- Process sequences in parallel (faster training)
- Capture long-term dependencies efficiently
- Maintain causality (no future information leakage)
Key Concepts
| Concept | Description |
|---|---|
| Causal Convolution | Convolution that only looks at past data |
| Dilated Convolution | Convolution with gaps between inputs |
| Receptive Field | How far back the network can “see” |
| Skip Connections | Direct paths for multi-scale feature extraction |
| Residual Connections | Help gradient flow in deep networks |
WaveNet Architecture
Original Architecture
Input → Causal Conv → [Dilated Conv Block] × N → Skip Connections → Output ↓ Gated Activation Unit (tanh ⊗ sigmoid)Core Components
-
Causal Convolution Layer
- Ensures no future information leakage
- First layer that processes raw input
-
Dilated Convolution Blocks
- Multiple layers with increasing dilation rates
- Exponentially growing receptive field: 1, 2, 4, 8, 16, 32…
-
Gated Activation Units
z = tanh(Wf * x) ⊙ sigmoid(Wg * x)- Wf: filter weights (what to learn)
- Wg: gate weights (what to forget)
-
Skip and Residual Connections
- Skip: aggregate information from all layers
- Residual: enable gradient flow in deep networks
Receptive Field Calculation
For dilated convolutions with dilation rates [1, 2, 4, 8, 16, …]:
Receptive Field = (kernel_size - 1) × Σ(dilation_rates) + 1Example: With kernel_size=2 and 10 layers:
- Dilation rates: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
- Receptive field = (2-1) × 1023 + 1 = 1024 timesteps
This means with hourly data, the model can “see” ~42 days of history!
Dilated Causal Convolutions
Why Dilated?
Standard convolutions have a linear relationship between receptive field and depth:
- Receptive field of N requires N layers
- 1000 timesteps would need 1000 layers!
Dilated convolutions provide exponential growth:
- Receptive field of N requires only log₂(N) layers
- 1024 timesteps need only 10 layers!
Visual Representation
Dilation = 1: ●─●─●─● │ │ │ │ [Conv Layer]
Dilation = 2: ●───●───●───● │ │ │ │ [Conv Layer]
Dilation = 4: ●───────●───────●───────● │ │ │ │ [Conv Layer]Causal Padding
To maintain causality, we use asymmetric padding:
- Pad only on the left side
- Ensures output[t] depends only on input[0:t]
// Causal padding for 1D convolutionfn causal_pad(input: &[f64], padding: usize) -> Vec<f64> { let mut padded = vec![0.0; padding]; padded.extend_from_slice(input); padded}Adapting WaveNet for Trading
Modifications for Financial Time Series
-
Input Features
- OHLCV data (Open, High, Low, Close, Volume)
- Technical indicators (RSI, MACD, Bollinger Bands)
- Market microstructure features
-
Output Heads
- Regression: Predict next price/return
- Classification: Predict direction (up/down/neutral)
- Probabilistic: Predict distribution parameters
-
Loss Functions
- MSE/MAE for regression
- Cross-entropy for classification
- Custom trading losses (Sharpe-aware)
Architecture for Trading
Input Features (OHLCV + Indicators) ↓ Input Projection (Linear) ↓ ┌─────────────────────────┐ │ WaveNet Blocks (×N) │ │ ├─ Dilated Conv │ │ ├─ Gated Activation │ │ ├─ Residual Connection │ │ └─ Skip Connection │ └─────────────────────────┘ ↓ Skip Aggregation (Sum) ↓ Output Layers (Dense) ↓ Predictions (Price/Direction)Implementation
Rust Implementation Structure
351_wavenet_trading/├── README.md├── README.ru.md├── readme.simple.md├── readme.simple.ru.md└── rust/ ├── Cargo.toml └── src/ ├── lib.rs ├── api/ # Bybit API client │ ├── mod.rs │ ├── bybit.rs │ └── storage.rs ├── models/ # WaveNet model │ ├── mod.rs │ ├── wavenet.rs │ ├── layers.rs │ └── activations.rs ├── trading/ # Trading logic │ ├── mod.rs │ ├── strategy.rs │ ├── signals.rs │ └── backtest.rs ├── analysis/ # Data analysis │ ├── mod.rs │ ├── features.rs │ └── indicators.rs └── bin/ # Executables ├── fetch_data.rs ├── train_wavenet.rs ├── predict.rs └── backtest.rsKey Implementation Details
1. Dilated Convolution
pub struct DilatedConv1D { weights: Vec<Vec<f64>>, bias: Vec<f64>, kernel_size: usize, dilation: usize, channels_in: usize, channels_out: usize,}
impl DilatedConv1D { pub fn forward(&self, input: &[Vec<f64>]) -> Vec<Vec<f64>> { let seq_len = input[0].len(); let mut output = vec![vec![0.0; seq_len]; self.channels_out];
for t in 0..seq_len { for c_out in 0..self.channels_out { let mut sum = self.bias[c_out];
for k in 0..self.kernel_size { let idx = t as i64 - (k * self.dilation) as i64; if idx >= 0 { for c_in in 0..self.channels_in { sum += self.weights[c_out][c_in * self.kernel_size + k] * input[c_in][idx as usize]; } } }
output[c_out][t] = sum; } }
output }}2. Gated Activation
pub fn gated_activation(filter: &[f64], gate: &[f64]) -> Vec<f64> { filter.iter() .zip(gate.iter()) .map(|(f, g)| f.tanh() * sigmoid(*g)) .collect()}
fn sigmoid(x: f64) -> f64 { 1.0 / (1.0 + (-x).exp())}3. WaveNet Block
pub struct WaveNetBlock { filter_conv: DilatedConv1D, gate_conv: DilatedConv1D, residual_conv: Conv1D, skip_conv: Conv1D,}
impl WaveNetBlock { pub fn forward(&self, input: &[Vec<f64>]) -> (Vec<Vec<f64>>, Vec<Vec<f64>>) { // Dilated convolutions let filter = self.filter_conv.forward(input); let gate = self.gate_conv.forward(input);
// Gated activation let activated: Vec<Vec<f64>> = filter.iter() .zip(gate.iter()) .map(|(f, g)| gated_activation(f, g)) .collect();
// Residual connection let residual = self.residual_conv.forward(&activated); let residual_out: Vec<Vec<f64>> = input.iter() .zip(residual.iter()) .map(|(i, r)| i.iter().zip(r.iter()).map(|(a, b)| a + b).collect()) .collect();
// Skip connection let skip = self.skip_conv.forward(&activated);
(residual_out, skip) }}Trading Strategy
Signal Generation
The WaveNet model outputs are converted to trading signals:
-
Regression-based Strategy
pub fn generate_signal(predicted_return: f64, threshold: f64) -> Signal {if predicted_return > threshold {Signal::Buy} else if predicted_return < -threshold {Signal::Sell} else {Signal::Hold}} -
Classification-based Strategy
pub fn generate_signal(probabilities: [f64; 3]) -> Signal {let (max_idx, _) = probabilities.iter().enumerate().max_by(|a, b| a.1.partial_cmp(b.1).unwrap()).unwrap();match max_idx {0 => Signal::Sell,1 => Signal::Hold,2 => Signal::Buy,_ => unreachable!()}}
Position Sizing
Using volatility-adjusted position sizing:
pub fn calculate_position_size( capital: f64, volatility: f64, risk_per_trade: f64, max_position: f64,) -> f64 { let vol_adjusted = risk_per_trade / volatility; (capital * vol_adjusted).min(capital * max_position)}Risk Management
- Stop Loss: Dynamic based on ATR (Average True Range)
- Take Profit: Risk-reward ratio of 1:2 or 1:3
- Maximum Drawdown: Position reduction at 10% drawdown
Backtesting Results
Performance Metrics
| Metric | Value |
|---|---|
| Total Return | - |
| Sharpe Ratio | - |
| Sortino Ratio | - |
| Maximum Drawdown | - |
| Win Rate | - |
| Profit Factor | - |
Note: Run the backtesting examples to see actual results with current data.
Running Backtests
# Fetch data from Bybitcargo run --bin fetch_data -- --symbol BTCUSDT --interval 1h --days 365
# Train WaveNet modelcargo run --bin train_wavenet -- --data ./data/BTCUSDT_1h.csv --epochs 100
# Run backtestcargo run --bin backtest -- --model ./models/wavenet.bin --data ./data/BTCUSDT_1h.csvAdvantages and Limitations
Advantages
- Parallelizable Training: Unlike RNNs, can process all timesteps simultaneously
- Large Receptive Field: Captures long-term patterns efficiently
- No Vanishing Gradients: Thanks to residual connections
- Flexible Architecture: Easy to adjust depth and receptive field
Limitations
- Memory Intensive: Stores all intermediate activations
- Fixed Receptive Field: Cannot dynamically adjust lookback
- Inference Speed: Full convolution needed for each prediction
- No Attention: Cannot focus on specific historical events
When to Use WaveNet
✅ Good for:
- Capturing cyclical patterns (hourly, daily, weekly)
- Markets with strong momentum/mean-reversion
- When parallelization is important
❌ Not ideal for:
- Very long sequences (>10,000 timesteps)
- When specific event attention is crucial
- Limited computational resources
References
Papers
-
van den Oord, A., et al. (2016) - “WaveNet: A Generative Model for Raw Audio”
-
Borovykh, A., Bohte, S., & Oosterlee, C. W. (2017) - “Conditional Time Series Forecasting with Convolutional Neural Networks”
-
Chen, W., et al. (2020) - “WaveNet-based Deep Learning for Financial Time Series”
- Applied to stock price prediction
Related Architectures
- Temporal Convolutional Networks (TCN): Simplified WaveNet for sequence modeling
- Temporal Fusion Transformers: Combines attention with temporal patterns
- N-BEATS: Interpretable time series forecasting
Usage
Prerequisites
- Rust 1.70+
- Internet connection for Bybit API
Quick Start
cd 351_wavenet_trading/rust
# Build the projectcargo build --release
# Fetch datacargo run --bin fetch_data -- --symbol BTCUSDT --interval 1h --days 30
# Train model (demo)cargo run --bin train_wavenet
# Generate predictionscargo run --bin predict
# Run backtestcargo run --bin backtestFile Structure
| File | Description |
|---|---|
README.md | This file - main documentation |
README.ru.md | Russian translation |
readme.simple.md | Simple explanation for beginners |
readme.simple.ru.md | Simple explanation in Russian |
rust/ | Rust implementation |
WaveNet brings the power of dilated causal convolutions to financial time series, offering a unique blend of long-term pattern recognition and computational efficiency.