Diffusion Models for Synthetic Time Series and Forecasting
Diffusion Models for Synthetic Time Series and Forecasting
This chapter introduces diffusion models for financial time series applications. Originally developed for image generation (Stable Diffusion, DALL-E), diffusion models have emerged as a powerful alternative to GANs for generating synthetic financial data and probabilistic forecasting.
Content
- Diffusion Models: From Images to Time Series
- Key Architectures for Time Series
- Code Examples
- Rust Implementation
- Practical Considerations
- Resources
Diffusion Models: From Images to Time Series
Diffusion models are a class of generative models that learn to generate data by reversing a gradual noising process. Unlike GANs, which learn through adversarial training, diffusion models learn to denoise data step by step.
The Intuition Behind Diffusion
The key insight is that it’s easier to learn small denoising steps than to generate data in one shot:
- Forward Process: Gradually add Gaussian noise to data until it becomes pure noise
- Reverse Process: Learn to undo each noise step, recovering the original data
This approach offers several advantages over GANs:
- Training stability: No adversarial dynamics, mode collapse is rare
- Quality: Often produces higher-quality samples
- Uncertainty quantification: Natural probabilistic interpretation
Forward Process: Adding Noise
The forward diffusion process is a Markov chain that gradually adds Gaussian noise:
$$q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I)$$
Where:
- $x_0$ is the original data
- $x_t$ is the noised data at step $t$
- $\beta_t$ is the noise schedule (typically $\beta_t \in [0.0001, 0.02]$)
- $T$ is the total number of diffusion steps (typically 1000)
A key property allows direct sampling at any timestep:
$$q(x_t | x_0) = \mathcal{N}(x_t; \sqrt{\bar{\alpha}_t} x_0, (1-\bar{\alpha}_t) I)$$
where $\bar{\alpha}t = \prod{s=1}^{t} (1-\beta_s)$
Reverse Process: Denoising
The reverse process learns to denoise:
$$p_\theta(x_{t-1} | x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t, t), \Sigma_\theta(x_t, t))$$
In practice, we train a neural network $\epsilon_\theta(x_t, t)$ to predict the noise added at each step. The training objective is:
$$\mathcal{L} = \mathbb{E}{t, x_0, \epsilon} \left[ | \epsilon - \epsilon\theta(x_t, t) |^2 \right]$$
Noise Schedules
The choice of noise schedule $\beta_t$ significantly affects performance:
| Schedule | Formula | Characteristics |
|---|---|---|
| Linear | $\beta_t = \beta_1 + \frac{t-1}{T-1}(\beta_T - \beta_1)$ | Simple, works well for images |
| Cosine | $\bar{\alpha}_t = \frac{f(t)}{f(0)}$, $f(t) = \cos^2(\frac{t/T + s}{1+s} \cdot \frac{\pi}{2})$ | Better for smaller images/sequences |
| Sigmoid | $\beta_t = \sigma(-6 + 12\frac{t}{T})$ | Smooth transitions |
Key Architectures for Time Series
TimeGrad: Autoregressive Diffusion
TimeGrad (Rasul et al., 2021) combines autoregressive modeling with diffusion:
Input: x_{1:t} (historical observations)├── RNN Encoder → Hidden state h_t├── Diffusion Process conditioned on h_t└── Output: p(x_{t+1:t+τ} | x_{1:t})Key features:
- Uses RNN/LSTM to encode history
- Diffusion generates future conditioned on hidden state
- Autoregressive: generates one step at a time
Limitations: Cumulative errors in autoregressive generation, slow inference.
CSDI: Conditional Score-based Diffusion
CSDI (Tashiro et al., NeurIPS 2021) uses attention-based conditioning:
Input: Partially observed time series with mask├── Temporal Attention (across time)├── Feature Attention (across variables)├── Score-based diffusion conditioned on observed values└── Output: Imputed/forecasted values with uncertaintyKey features:
- Self-supervised training with random masking
- Handles both imputation and forecasting
- 40-65% improvement over existing probabilistic methods
- Generates probabilistic forecasts (multiple samples)
Diffusion-TS: Decomposed Representations
Diffusion-TS (ICLR 2024) introduces interpretable decomposition:
Input: Time series x├── Encoder-Decoder Transformer├── Decomposition:│ ├── Trend: Polynomial regression│ └── Seasonal: Fourier series├── Diffusion in decomposed space└── Output: Interpretable synthetic seriesKey features:
- Reconstructs samples directly (not noise)
- Fourier-based loss for spectral accuracy
- Same architecture for generation, forecasting, imputation
- State-of-the-art on Stocks, Energy, ETTh datasets
TimeDiff and Recent Advances (2024-2025)
Recent developments have addressed key limitations:
| Method | Innovation | Performance |
|---|---|---|
| TimeDiff | Conditional diffusion at past-future boundary | 9-47% improvement over baselines |
| ARMD | Auto-regressive moving diffusion | Best MSE on Exchange Rates |
| SimDiff (2025) | Simplified architecture, faster inference | Competitive with 10x fewer params |
| MG-TSD | Multi-granularity temporal structures | SOTA on long-term forecasting |
| S²DBM | Brownian Bridge dynamics | Natural boundary conditions |
Code Examples
01: Diffusion Fundamentals
The notebook 01_diffusion_fundamentals.ipynb covers:
- Forward and reverse diffusion processes
- Noise schedules visualization
- ELBO derivation and loss functions
- Simple 1D diffusion example
02: DDPM Implementation from Scratch
The notebook 02_ddpm_from_scratch.ipynb implements:
- Complete DDPM training loop
- U-Net architecture for time series
- Sampling algorithms (DDPM, DDIM)
- Visualization of denoising process
03: TimeGrad for Cryptocurrency Forecasting
The notebook 03_timegrad_crypto.ipynb demonstrates:
- TimeGrad architecture with RNN encoder
- Training on Bitcoin/Ethereum hourly data
- Probabilistic forecasting with confidence intervals
- Comparison with LSTM baselines
04: CSDI for Imputation and Forecasting
The notebook 04_csdi_imputation_forecasting.ipynb shows:
- CSDI implementation with attention
- Missing data imputation in OHLCV series
- Probabilistic forecasting
- Evaluation metrics (CRPS, calibration)
05: Diffusion-TS for Synthetic Data Generation
The notebook 05_diffusion_ts_synthetic.ipynb covers:
- Generating synthetic cryptocurrency data
- Trend-seasonal decomposition
- Quality evaluation (discriminative score, FID)
- Comparison with TimeGAN (Chapter 21)
06: Diffusion vs GANs Comparison
The notebook 06_diffusion_vs_gans.ipynb compares:
- Training stability (diffusion vs GAN)
- Sample quality metrics
- Diversity vs fidelity trade-offs
- Computational requirements
07: Complete Bitcoin Forecasting Pipeline
The notebook 07_bitcoin_pipeline.ipynb provides:
- End-to-end production pipeline
- Technical indicator features (RSI, volatility)
- Monte Carlo uncertainty estimation
- Backtesting with realistic constraints
Rust Implementation
The rust_diffusion_crypto directory contains a Rust implementation using the tch-rs (PyTorch bindings) and burn frameworks:
rust_diffusion_crypto/├── Cargo.toml├── README.md├── src/│ ├── lib.rs│ ├── main.rs│ ├── data/ # Bybit API client, preprocessing│ ├── model/ # DDPM, U-Net, noise schedules│ ├── training/ # Training loop, losses│ └── utils/ # Config, checkpoints└── examples/ ├── fetch_data.rs ├── train_ddpm.rs └── forecast.rsSee rust_diffusion_crypto/README.md for detailed usage.
Practical Considerations
When to Use Diffusion Models
Good use cases:
- Synthetic data generation for backtesting
- Probabilistic forecasting with uncertainty
- Missing data imputation
- Scenario generation for risk analysis
- When you need diversity in generated samples
Not ideal for:
- Real-time/low-latency predictions (slow inference)
- Limited computational resources
- When model interpretability is critical
- Very short sequences (<24 timesteps)
Computational Requirements
| Task | GPU Memory | Training Time | Inference Time |
|---|---|---|---|
| Simple DDPM | 4GB | 2-4 hours | 100ms/sample |
| TimeGrad | 8GB | 8-12 hours | 500ms/sample |
| CSDI | 12GB | 12-24 hours | 200ms/sample |
| Diffusion-TS | 8GB | 6-10 hours | 150ms/sample |
Optimization Techniques
- DDIM Sampling: Reduce steps from 1000 to 50-100 with minimal quality loss
- Token Merging: tomesd speeds up 1.24x
- Distillation: Train smaller student models
- Caching: Cache attention computations across steps
Resources
Papers
- Denoising Diffusion Probabilistic Models (DDPM), Ho et al., 2020
- TimeGrad: Autoregressive Denoising Diffusion for Time Series, Rasul et al., 2021
- CSDI: Conditional Score-based Diffusion for Imputation, Tashiro et al., NeurIPS 2021
- Diffusion-TS: Interpretable Diffusion for Time Series, ICLR 2024
- Diffusion Models for Time Series Forecasting: A Survey, Meijer et al., 2024
- Generation of Synthetic Financial Time Series by Diffusion, 2025
Implementations
- ermongroup/CSDI - Official CSDI
- Y-debug-sys/Diffusion-TS - Diffusion-TS
- amazon-science/unconditional-time-series-diffusion - TSDiff
- GavinKerrworking/TimeGrad - TimeGrad implementation
Tutorials & Guides
- The Annotated Diffusion Model - Hugging Face
- What are Diffusion Models? - Lilian Weng
- Diffusion Models in Time-Series Forecasting
Related Chapters
- Chapter 19: RNN for Multivariate Time Series - LSTM/GRU foundations
- Chapter 20: Autoencoders for Risk Factors - Latent space models
- Chapter 21: GANs for Synthetic Time Series - Alternative generative approach