Grad-CAM for Financial Markets: Visual Explanations from Deep Networks
Grad-CAM for Financial Markets: Visual Explanations from Deep Networks
Gradient-weighted Class Activation Mapping (Grad-CAM) is a powerful interpretability technique that produces visual explanations for decisions made by Convolutional Neural Networks (CNNs). When applied to financial markets, Grad-CAM helps traders and quantitative analysts understand which parts of price charts, candlestick patterns, or technical indicators are most influential in the model’s trading decisions.
Deep learning models are often criticized as “black boxes” in finance, where regulatory requirements and risk management demand transparency. Grad-CAM addresses this by highlighting the regions of input data (such as time series patterns or chart images) that contribute most to predictions like “buy,” “sell,” or “hold” signals.
This chapter covers the theoretical foundations of Grad-CAM, its adaptation for financial time series and chart pattern recognition, and practical implementations in both Python and Rust. We demonstrate applications using both traditional stock market data and cryptocurrency data from the Bybit exchange.
Content
- Introduction to Explainable AI in Finance
- Understanding Grad-CAM
- CNNs for Financial Data
- Implementation
- Backtesting with Interpretable Signals
- Rust Implementation
- References
Introduction to Explainable AI in Finance
Explainable Artificial Intelligence (XAI) has become crucial in financial applications for several reasons:
- Regulatory Compliance: Financial regulators increasingly require that algorithmic trading decisions be explainable and auditable
- Risk Management: Understanding why a model makes certain predictions helps identify potential failure modes
- Trust Building: Portfolio managers and clients need to understand the rationale behind AI-driven decisions
- Model Debugging: Interpretability tools help identify when models learn spurious correlations
Traditional interpretability methods like feature importance scores work well for tabular data but struggle with the spatial and temporal patterns that CNNs learn from financial charts. Grad-CAM fills this gap by providing intuitive visual explanations that highlight which parts of a chart pattern influenced the model’s decision.
Understanding Grad-CAM
Mathematical Foundations
Grad-CAM uses the gradient information flowing into the final convolutional layer of a CNN to produce a coarse localization map highlighting the important regions in the input for predicting a target concept.
For a given class $c$, the gradient of the score $y^c$ (before softmax) with respect to feature maps $A^k$ of a convolutional layer is computed as:
$$\alpha_k^c = \frac{1}{Z} \sum_i \sum_j \frac{\partial y^c}{\partial A_{ij}^k}$$
where $Z$ is the number of pixels in the feature map, and $\alpha_k^c$ represents the importance weight of feature map $k$ for class $c$.
The Grad-CAM heatmap is then computed as:
$$L_{Grad-CAM}^c = ReLU\left(\sum_k \alpha_k^c A^k\right)$$
The ReLU is applied because we are only interested in features that have a positive influence on the class of interest. Negative influences likely belong to other classes.
How Grad-CAM Works
The Grad-CAM algorithm proceeds in the following steps:
- Forward Pass: Pass the input (e.g., a candlestick chart image) through the CNN to obtain the class scores
- Backward Pass: Compute the gradient of the target class score with respect to the feature maps of the last convolutional layer
- Weight Computation: Global average pool the gradients to obtain importance weights for each feature map
- Weighted Combination: Compute a weighted combination of feature maps using these importance weights
- ReLU Activation: Apply ReLU to focus only on positive influences
- Upsampling: Upsample the resulting heatmap to the input image size for visualization
Variants: Grad-CAM++, Score-CAM, and LayerCAM
Several improvements to the original Grad-CAM have been proposed:
- Grad-CAM++: Uses a weighted average of pixel-wise gradients, providing better localization for multiple instances of the same class
- Score-CAM: Removes the dependency on gradients entirely, using the forward passing score on each activation map as its weight
- LayerCAM: Combines the localization abilities of CAM-based methods with the gradient-free nature of perturbation methods
For financial applications, the original Grad-CAM often suffices, but Grad-CAM++ can be useful when multiple patterns contribute to a single prediction.
CNNs for Financial Data
Converting Time Series to Images
Financial time series can be converted to images in several ways for CNN processing:
- OHLCV Heatmaps: Stack multiple time series (Open, High, Low, Close, Volume) as channels
- Gramian Angular Fields (GAF): Transform time series into polar coordinates and compute angular differences
- Markov Transition Fields (MTF): Encode transition probabilities between discretized states
- Recurrence Plots: Visualize the recurrence of states in phase space
- Candlestick Charts: Render actual candlestick chart images
Each representation captures different aspects of the data:
- OHLCV heatmaps preserve raw price information
- GAF and MTF capture temporal correlations
- Candlestick charts leverage patterns that traders have used for centuries
Candlestick Chart Recognition
Candlestick patterns have been used by traders for centuries to predict market movements. Common patterns include:
- Doji: Indicates indecision (small body, long wicks)
- Hammer/Hanging Man: Reversal signals (small body, long lower shadow)
- Engulfing Patterns: Strong reversal signals (large candle engulfs previous)
- Morning/Evening Star: Three-candle reversal patterns
CNNs can learn to recognize these patterns directly from chart images, and Grad-CAM reveals which specific patterns the model focuses on.
Technical Indicator Heatmaps
Multiple technical indicators can be combined into multi-channel images:
- Channel 1: Price relative to moving averages
- Channel 2: RSI (Relative Strength Index)
- Channel 3: MACD histogram
- Channel 4: Bollinger Band position
- Channel 5: Volume relative to average
This representation allows CNNs to learn complex interactions between indicators that would be difficult to capture with traditional feature engineering.
Implementation
Code Example: Building a CNN for Price Movement Prediction
The Python implementation in python/model.py provides a CNN architecture specifically designed for financial time series classification, along with Grad-CAM functionality:
import torchimport torch.nn as nnimport torch.nn.functional as F
class FinancialCNN(nn.Module): """CNN for financial time series classification with Grad-CAM support."""
def __init__(self, input_channels=5, num_classes=3, sequence_length=60): super().__init__() # Convolutional layers self.conv1 = nn.Conv1d(input_channels, 32, kernel_size=5, padding=2) self.bn1 = nn.BatchNorm1d(32) self.conv2 = nn.Conv1d(32, 64, kernel_size=5, padding=2) self.bn2 = nn.BatchNorm1d(64) self.conv3 = nn.Conv1d(64, 128, kernel_size=3, padding=1) self.bn3 = nn.BatchNorm1d(128)
# Global average pooling and classifier self.gap = nn.AdaptiveAvgPool1d(1) self.fc = nn.Linear(128, num_classes)
# For Grad-CAM self.gradients = None self.activations = None
def forward(self, x): x = F.relu(self.bn1(self.conv1(x))) x = F.relu(self.bn2(self.conv2(x))) x = F.relu(self.bn3(self.conv3(x)))
# Store activations for Grad-CAM if x.requires_grad: x.register_hook(self.save_gradient) self.activations = x
x = self.gap(x).squeeze(-1) x = self.fc(x) return xSee the full implementation in python/model.py and the training pipeline in python/train.py.
Code Example: Applying Grad-CAM to Trading Signals
The notebook example.ipynb demonstrates:
- Loading and preprocessing cryptocurrency data from Bybit
- Training a CNN to predict price movements
- Generating Grad-CAM visualizations for trading signals
- Interpreting which time periods and features drive predictions
Code Example: Visualizing Important Chart Patterns
The Grad-CAM implementation highlights which parts of the input time series most strongly influenced the model’s decision:
class GradCAM: """Grad-CAM implementation for 1D financial time series."""
def __init__(self, model, target_layer): self.model = model self.target_layer = target_layer
def __call__(self, input_tensor, target_class=None): # Forward pass output = self.model(input_tensor)
if target_class is None: target_class = output.argmax(dim=1)
# Backward pass self.model.zero_grad() one_hot = torch.zeros_like(output) one_hot[0, target_class] = 1 output.backward(gradient=one_hot, retain_graph=True)
# Get gradients and activations gradients = self.model.gradients activations = self.model.activations
# Weight by global average pooled gradients weights = gradients.mean(dim=2, keepdim=True) cam = (weights * activations).sum(dim=1)
# Apply ReLU cam = F.relu(cam)
# Normalize cam = (cam - cam.min()) / (cam.max() - cam.min() + 1e-8)
return cam.squeeze().detach().numpy()Backtesting with Interpretable Signals
Code Example: Building an Interpretable Trading Strategy
The backtesting module in python/backtest.py implements a trading strategy that:
- Uses the CNN model to generate buy/sell signals
- Applies Grad-CAM to explain each trading decision
- Filters trades based on confidence and interpretability
- Tracks performance metrics including Sharpe ratio, maximum drawdown, and win rate
Key metrics tracked:
- Sharpe Ratio: Risk-adjusted return measurement
- Sortino Ratio: Downside risk-adjusted return
- Maximum Drawdown: Largest peak-to-trough decline
- Win Rate: Percentage of profitable trades
- Profit Factor: Gross profits divided by gross losses
Rust Implementation
Production-Ready Grad-CAM with Bybit Integration
The Rust implementation in rust/ provides a production-ready system with:
- High-Performance Inference: Optimized for low-latency trading
- Bybit API Integration: Real-time data fetching and order placement
- Memory Safety: Rust’s ownership model prevents common bugs
- Concurrent Processing: Async runtime for handling multiple symbols
The Rust implementation focuses on efficient inference rather than training, as model training is typically done offline in Python.
See rust/src/lib.rs for the main library and rust/examples/ for usage examples.
References
-
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
- Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D.
- URL: https://arxiv.org/abs/1610.02391
- Year: 2016
-
Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks
- Chattopadhay, A., Sarkar, A., Howlader, P., & Balasubramanian, V.N.
- URL: https://arxiv.org/abs/1710.11063
- Year: 2018
-
Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks
- Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S., Mardziel, P., & Hu, X.
- URL: https://arxiv.org/abs/1910.01279
- Year: 2020
-
Encoding Time Series as Images for Visual Inspection and Classification Using Tiled Convolutional Neural Networks
- Wang, Z., & Oates, T.
- AAAI Workshop on Artificial Intelligence for Smart Grids and Buildings
- Year: 2015
-
Deep Learning for Financial Applications: A Survey
- Ozbayoglu, A.M., Gudelek, M.U., & Sezer, O.B.
- Applied Soft Computing
- Year: 2020
-
Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI
- Arrieta, A.B., et al.
- Information Fusion
- Year: 2020