Saliency Maps for Trading: Visualizing Model Interpretability in Financial Predictions
Saliency Maps for Trading: Visualizing Model Interpretability in Financial Predictions
Saliency maps are powerful visualization techniques that reveal which input features most strongly influence a neural network’s predictions. Originally developed for computer vision to highlight important regions in images, saliency maps have been adapted for time series analysis in trading, helping traders and researchers understand why models make specific predictions about market movements.
In algorithmic trading, model interpretability is crucial for several reasons:
- Risk Management: Understanding which factors drive predictions helps identify potential model failures
- Regulatory Compliance: Financial regulators increasingly require explainability in automated trading systems
- Strategy Development: Insights from saliency maps can inform new trading hypotheses
- Model Debugging: Visualizing feature importance helps detect overfitting or spurious correlations
This chapter covers the theory and implementation of saliency maps for financial time series, with practical applications using both stock market and cryptocurrency data.
Content
- Introduction to Saliency Maps
- Types of Saliency Methods
- Saliency Maps for Time Series
- Implementation with PyTorch
- Trading Strategy Based on Saliency
- Backtesting the Strategy
- Rust Implementation
- References
Introduction to Saliency Maps
A saliency map is a visualization that highlights which parts of the input have the greatest impact on the model’s output. For a neural network with output y and input x, the saliency is computed as the gradient of the output with respect to the input:
S(x) = ∂y/∂xIn the context of trading, if our model predicts price direction based on historical OHLCV (Open, High, Low, Close, Volume) data, the saliency map tells us which time steps and which features (price, volume, technical indicators) the model considers most important for its prediction.
Why Saliency Maps Matter for Trading
Traditional feature importance methods like permutation importance or SHAP values provide global insights but may miss temporal dynamics. Saliency maps offer:
- Instance-level explanations: Understand specific trade signals
- Temporal resolution: See which time steps matter most
- Feature interactions: Identify when combinations of features trigger signals
- Real-time computation: Gradients are fast to compute during live trading
Types of Saliency Methods
Vanilla Gradients
The simplest saliency method computes the gradient of the output class score with respect to the input:
saliency = torch.autograd.grad(output, input)[0]The absolute value of the gradient indicates the sensitivity of the prediction to each input feature. Larger gradients suggest more important features.
Limitations:
- Can be noisy due to sharp gradients in ReLU networks
- May highlight irrelevant features that happen to have large local gradients
Gradient × Input
This method multiplies the gradient by the input value itself:
S(x) = x × ∂y/∂xThis modification ensures that features with zero input values have zero saliency, providing better attribution. It captures both the sensitivity (gradient) and the actual contribution (input magnitude).
Integrated Gradients
Integrated Gradients (IG) addresses the gradient saturation problem by integrating gradients along a path from a baseline (typically zero) to the actual input:
IG(x) = (x - x') × ∫₀¹ ∂F(x' + α(x - x'))/∂x dαWhere x’ is the baseline input. This method satisfies important axioms:
- Sensitivity: If changing a feature changes the prediction, it gets non-zero attribution
- Implementation Invariance: Attributions are the same for functionally equivalent networks
- Completeness: Attributions sum to the difference between output at x and baseline
SmoothGrad
SmoothGrad reduces noise in gradient-based saliency by averaging gradients over multiple noisy versions of the input:
SmoothGrad(x) = (1/n) × Σ ∂y/∂(x + N(0, σ²))This produces visually cleaner saliency maps while preserving the most important attributions.
Saliency Maps for Time Series
Adapting saliency maps to financial time series requires considering the temporal structure:
Input Representation
For a trading model, the input typically has shape (batch, sequence_length, features):
- sequence_length: Number of historical time steps (e.g., 60 days)
- features: OHLCV data, technical indicators, fundamental data
Temporal Saliency Visualization
The saliency map has the same shape as the input. We can visualize it as:
- Heatmap: Time steps × Features showing importance
- Aggregated time importance: Sum absolute saliency over features for each time step
- Aggregated feature importance: Sum absolute saliency over time for each feature
Important Considerations
- Normalization: Scale saliency values for visualization (e.g., min-max or percentile)
- Sign: Positive gradients push toward bullish predictions; negative toward bearish
- Baseline selection: For integrated gradients, choose appropriate baselines (zero, historical mean, or market-neutral state)
Implementation with PyTorch
Code Example: Building a Trading Model
The notebook 01_saliency_trading_model.ipynb demonstrates building a neural network for price direction prediction:
import torchimport torch.nn as nn
class TradingLSTM(nn.Module): """LSTM model for predicting price direction."""
def __init__(self, input_size, hidden_size, num_layers, dropout=0.2): super().__init__() self.lstm = nn.LSTM( input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True, dropout=dropout if num_layers > 1 else 0 ) self.fc = nn.Linear(hidden_size, 1) self.sigmoid = nn.Sigmoid()
def forward(self, x): lstm_out, _ = self.lstm(x) last_hidden = lstm_out[:, -1, :] output = self.sigmoid(self.fc(last_hidden)) return outputCode Example: Computing Saliency Maps
The notebook 02_saliency_computation.ipynb shows how to compute various saliency maps:
class SaliencyComputer: """Compute saliency maps for trading models."""
def __init__(self, model): self.model = model self.model.eval()
def vanilla_gradient(self, x): """Compute vanilla gradient saliency.""" x = x.clone().requires_grad_(True) output = self.model(x) output.backward(torch.ones_like(output)) return x.grad.abs()
def gradient_x_input(self, x): """Compute gradient × input saliency.""" x = x.clone().requires_grad_(True) output = self.model(x) output.backward(torch.ones_like(output)) return (x.grad * x).abs()
def integrated_gradients(self, x, baseline=None, steps=50): """Compute integrated gradients.""" if baseline is None: baseline = torch.zeros_like(x)
# Generate interpolated inputs alphas = torch.linspace(0, 1, steps).view(-1, 1, 1, 1) interpolated = baseline + alphas * (x - baseline) interpolated = interpolated.view(-1, x.shape[1], x.shape[2]) interpolated.requires_grad_(True)
# Compute gradients outputs = self.model(interpolated) outputs.sum().backward()
# Average gradients and multiply by (x - baseline) avg_grads = interpolated.grad.view(steps, -1, x.shape[1], x.shape[2]).mean(0) ig = (x - baseline) * avg_grads return ig.abs()Trading Strategy Based on Saliency
Feature Importance Analysis
Saliency maps enable dynamic feature importance analysis:
- Identify regime changes: When the model suddenly starts focusing on different features, it may signal a market regime change
- Validate trading signals: High-confidence predictions should have clear, interpretable saliency patterns
- Filter noise: Ignore signals where saliency is diffuse or focuses on irrelevant features
Adaptive Feature Selection
The notebook 03_adaptive_strategy.ipynb implements a strategy that:
- Computes saliency maps for each prediction
- Identifies the top-k most important features for that prediction
- Only trades when the important features align with domain knowledge
- Adjusts position size based on saliency concentration
def saliency_weighted_signal(model, x, feature_names, threshold=0.7): """Generate trading signal weighted by saliency interpretability."""
# Get prediction and saliency saliency = compute_saliency(model, x) prediction = model(x).item()
# Aggregate saliency by feature feature_importance = saliency.mean(dim=1).squeeze() # Average over time
# Check if top features are interpretable top_features = feature_importance.topk(3).indices interpretable = check_feature_interpretability(top_features, feature_names)
# Compute concentration (entropy-based) concentration = 1 - entropy(feature_importance.softmax(dim=0))
if interpretable and concentration > threshold: return prediction, concentration else: return 0.5, 0 # No signalBacktesting the Strategy
The notebook 04_backtest.ipynb demonstrates backtesting with performance metrics:
Key Metrics
| Metric | Description |
|---|---|
| Sharpe Ratio | Risk-adjusted return: (Return - Rf) / Std |
| Sortino Ratio | Downside-adjusted return |
| Maximum Drawdown | Largest peak-to-trough decline |
| Win Rate | Percentage of profitable trades |
| Profit Factor | Gross profit / Gross loss |
Comparison with Baseline
We compare:
- Baseline Model: Trade on all signals
- Saliency-Filtered: Only trade when saliency is interpretable
- Saliency-Weighted: Position size based on saliency concentration
Rust Implementation
The rust_examples/ directory contains a production-ready Rust implementation with:
- High-performance inference: Optimized for low-latency trading
- Bybit API integration: Real-time cryptocurrency data
- Saliency computation: Gradient-based attribution in Rust
See rust_examples/README.md for details.
Running Rust Examples
cd rust_examples
# Fetch data from Bybitcargo run --example fetch_data
# Train model and compute saliencycargo run --example saliency_trading
# Run backtestcargo run --example backtestReferences
-
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
- Simonyan, K., Vedaldi, A., & Zisserman, A. (2013)
- URL: https://arxiv.org/abs/1312.6034
- Introduced vanilla gradient saliency maps
-
Axiomatic Attribution for Deep Networks (Integrated Gradients)
- Sundararajan, M., Taly, A., & Yan, Q. (2017)
- URL: https://arxiv.org/abs/1703.01365
- Proposed integrated gradients method
-
SmoothGrad: Removing Noise by Adding Noise
- Smilkov, D., Thorat, N., Kim, B., Viégas, F., & Wattenberg, M. (2017)
- URL: https://arxiv.org/abs/1706.03825
- Introduced SmoothGrad for cleaner attributions
-
Interpretable Machine Learning for Finance
- Molnar, C. (2022)
- URL: https://christophm.github.io/interpretable-ml-book/
- Comprehensive guide to ML interpretability
-
Explainable AI for Algorithmic Trading
- Chen, J., et al. (2021)
- Application of interpretability methods to trading