Skip to content

Chapter 91: Transfer Learning for Trading

Chapter 91: Transfer Learning for Trading

Overview

Transfer Learning is a machine learning technique where a model trained on one task (source domain) is adapted to perform a different but related task (target domain). In trading, this means leveraging knowledge learned from one market, asset class, or time period to improve predictions on another. This approach is particularly valuable when labeled financial data is scarce, expensive to obtain, or when market conditions shift.

Table of Contents

  1. Introduction
  2. Theoretical Foundation
  3. Types of Transfer Learning
  4. Domain Adaptation Methods
  5. Application to Financial Markets
  6. Cross-Market Transfer
  7. Implementation Strategy
  8. Bybit Integration
  9. Risk Management
  10. Performance Metrics
  11. Comparison with Traditional Approaches
  12. References

Introduction

Traditional machine learning models for trading face several fundamental challenges:

  • Data scarcity: New assets, markets, or instruments lack sufficient historical data
  • Regime changes: Models trained on one market regime fail when conditions shift
  • High labeling cost: Creating accurate labels for trading signals requires domain expertise
  • Non-stationarity: Financial time series distributions evolve over time

Why Transfer Learning for Trading?

┌─────────────────────────────────────────────────────────────────────────┐
│ The Transfer Learning Trading Problem │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Traditional Approach: Transfer Learning: │
│ ───────────────────── ────────────────── │
│ Train from scratch on Pre-train on data-rich source, │
│ each new market/asset then adapt to target domain │
│ │
│ Problem: Insufficient data Solution: Leverage knowledge │
│ for new or niche markets from related domains │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Traditional: Transfer Learning: │ │
│ │ Source: [S&P 500] Source: [S&P 500] ──┐ │ │
│ │ Target: [New Crypto] ✗ Pre-train │ │ │
│ │ (not enough data!) ↓ │ │
│ │ Target: [New Crypto] ✓ │ │
│ │ (fine-tune with little data)│ │
│ │ │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘

Key Advantages

AspectTraditional MLTransfer Learning
Data requirementLarge per-task datasetSmall target dataset
Training timeFull training each timeFast fine-tuning
Cold-start problemCannot handleHandles well
Market regime adaptationRetrain from scratchFine-tune existing model
Cross-market knowledgeNo sharingKnowledge reuse
New asset coverageNeeds extensive historyWorks with limited history

Theoretical Foundation

The Transfer Learning Framework

Transfer learning operates on the premise that knowledge gained from a source domain $D_S$ with task $T_S$ can improve learning in a target domain $D_T$ with task $T_T$, where $D_S \neq D_T$ or $T_S \neq T_T$.

A domain $D = {X, P(X)}$ consists of a feature space $X$ and a marginal probability distribution $P(X)$. A task $T = {Y, f(\cdot)}$ consists of a label space $Y$ and a predictive function $f(\cdot)$.

Mathematical Formulation

Objective: Given source domain data $D_S$ and learning task $T_S$, target domain data $D_T$ and learning task $T_T$, transfer learning aims to improve the target predictive function $f_T(\cdot)$ using knowledge from $D_S$ and $T_S$.

Domain Divergence: The discrepancy between source and target domains can be measured using:

$$d_A(D_S, D_T) = 2 \left(1 - 2\epsilon(h)\right)$$

where $\epsilon(h)$ is the error of a classifier $h$ trained to distinguish source from target samples. This is the A-distance (proxy A-distance).

Generalization Bound: For a hypothesis $h$ trained on source domain:

$$\epsilon_T(h) \leq \epsilon_S(h) + \frac{1}{2}d_A(D_S, D_T) + \lambda$$

where $\epsilon_T$ and $\epsilon_S$ are the target and source errors, and $\lambda$ is the error of the ideal joint hypothesis.

Key Components

┌────────────────────────────────────────────────────────────────────────┐
│ Transfer Learning Architecture │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ SOURCE DOMAIN (Data-rich) TARGET DOMAIN (Data-scarce) │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Large Dataset │ │ Small Dataset │ │
│ │ (e.g., S&P 500 │ │ (e.g., new crypto │ │
│ │ 5 years daily) │ │ 3 months daily) │ │
│ └──────────┬──────────┘ └──────────┬──────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Feature Extractor │──── TRANSFER ──→│ Feature Extractor │ │
│ │ (shared layers) │ (weights) │ (frozen/fine-tuned)│ │
│ └──────────┬──────────┘ └──────────┬──────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Source Classifier │ │ Target Classifier │ │
│ │ (source task head) │ │ (new task head) │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ │
│ Training Phase: Adaptation Phase: │
│ - Full training on source - Freeze lower layers │
│ - Learn general features - Fine-tune upper layers │
│ - Extract patterns - Train new classifier │
│ │
└────────────────────────────────────────────────────────────────────────┘

Types of Transfer Learning

1. Inductive Transfer Learning

The source and target tasks are different, but related. The source domain data is used to improve the target task.

Trading Application: Train a model to predict volatility in equity markets, then transfer to predict volatility in crypto markets.

Source Task: Predict S&P 500 volatility (classification)
Target Task: Predict BTC/USDT volatility (classification)
Shared Knowledge: Volatility patterns, mean-reversion dynamics,
volume-price relationships

2. Transductive Transfer Learning

The source and target tasks are the same, but the domains differ. The marginal distributions $P(X_S) \neq P(X_T)$.

Trading Application: A model trained on US equity data is adapted for emerging market equities where the feature distributions differ.

Source Domain: US Equities (high liquidity, tight spreads)
Target Domain: Emerging Market Equities (low liquidity, wide spreads)
Same Task: Price direction prediction
Different Distribution: Market microstructure features differ

3. Unsupervised Transfer Learning

No labeled data is available in either domain. The focus is on learning representations.

Trading Application: Learn general market representations from unlabeled price data across many assets, then use these representations for downstream tasks.

Pre-training: Autoencoder on 10,000+ time series (no labels)
Transfer: Use learned features for anomaly detection on new assets

Domain Adaptation Methods

Maximum Mean Discrepancy (MMD)

MMD measures the distance between source and target distributions in a reproducing kernel Hilbert space (RKHS):

$$MMD(D_S, D_T) = \left| \frac{1}{n_s}\sum_{i=1}^{n_s}\phi(x_i^s) - \frac{1}{n_t}\sum_{j=1}^{n_t}\phi(x_j^t) \right|_{\mathcal{H}}$$

By minimizing MMD during training, the model learns domain-invariant features.

Correlation Alignment (CORAL)

CORAL aligns the second-order statistics (covariance) of source and target features:

$$\mathcal{L}_{CORAL} = \frac{1}{4d^2}|C_S - C_T|_F^2$$

where $C_S$ and $C_T$ are the feature covariance matrices of source and target domains.

Adversarial Domain Adaptation

Uses a domain discriminator trained adversarially to create domain-invariant features:

┌─────────────────────────────────────────────────────────────────────────┐
│ Adversarial Domain Adaptation │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Source Data ──┐ │
│ ├──→ Feature Extractor ──┬──→ Task Classifier │
│ Target Data ──┘ (G) │ (predicts labels) │
│ │ │
│ └──→ Domain Discriminator │
│ (source vs target?) │
│ │
│ Training: │
│ - Task Classifier: minimize task loss │
│ - Domain Discriminator: maximize domain classification accuracy │
│ - Feature Extractor: minimize task loss + maximize domain confusion │
│ │
│ Result: Features that are useful for the task but │
│ indistinguishable between domains │
│ │
└─────────────────────────────────────────────────────────────────────────┘

Fine-Tuning Strategies

Strategy 1: Feature Extraction (Freeze All)
┌──────────────────────────────────────────┐
│ Layer 1: [FROZEN] ← Pre-trained weights │
│ Layer 2: [FROZEN] ← Pre-trained weights │
│ Layer 3: [FROZEN] ← Pre-trained weights │
│ New Head: [TRAINABLE] ← Random init │
└──────────────────────────────────────────┘
Strategy 2: Partial Fine-Tuning
┌──────────────────────────────────────────┐
│ Layer 1: [FROZEN] ← Pre-trained weights │
│ Layer 2: [FINE-TUNED] ← Small LR │
│ Layer 3: [FINE-TUNED] ← Medium LR │
│ New Head: [TRAINABLE] ← Large LR │
└──────────────────────────────────────────┘
Strategy 3: Full Fine-Tuning
┌──────────────────────────────────────────┐
│ Layer 1: [FINE-TUNED] ← Very small LR │
│ Layer 2: [FINE-TUNED] ← Small LR │
│ Layer 3: [FINE-TUNED] ← Medium LR │
│ New Head: [TRAINABLE] ← Large LR │
└──────────────────────────────────────────┘

Application to Financial Markets

Cross-Market Transfer

Transfer knowledge between different markets (e.g., stocks to crypto):

  1. Feature Alignment: Map features from both markets to a common space
  2. Pattern Transfer: Recognize similar patterns (momentum, mean-reversion) across markets
  3. Regime Detection: Transfer regime detection models across markets

Cross-Asset Transfer

Transfer within the same market across asset classes:

Source Assets (Data-rich): Target Assets (Data-scarce):
├── BTC/USDT (years of data) → ├── New DeFi Token (weeks of data)
├── ETH/USDT (years of data) → ├── Recently Listed Token
└── Major Forex Pairs → └── Exotic Currency Pair

Temporal Transfer

Adapt models across different time periods or market regimes:

Pre-COVID Model ──→ Fine-tune ──→ Post-COVID Model
(trained 2015-2019) (adapted to 2020+)
Bull Market Model ──→ Fine-tune ──→ Bear Market Model
(trained on uptrend) (adapted to downtrend)

Feature Extraction Pipeline

# Python example: Transfer Learning Pipeline
import torch
import torch.nn as nn
class TransferFeatureExtractor(nn.Module):
"""Feature extractor pre-trained on source domain."""
def __init__(self, input_dim, hidden_dim, feature_dim):
super().__init__()
self.layers = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.BatchNorm1d(hidden_dim),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(hidden_dim, hidden_dim),
nn.BatchNorm1d(hidden_dim),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(hidden_dim, feature_dim),
)
def forward(self, x):
return self.layers(x)
class DomainAdaptiveTrader(nn.Module):
"""Trading model with domain adaptation via MMD."""
def __init__(self, feature_extractor, feature_dim, num_classes):
super().__init__()
self.feature_extractor = feature_extractor
self.classifier = nn.Sequential(
nn.Linear(feature_dim, 64),
nn.ReLU(),
nn.Linear(64, num_classes),
)
def forward(self, x):
features = self.feature_extractor(x)
return self.classifier(features), features
def compute_mmd(self, source_features, target_features):
"""Maximum Mean Discrepancy between domains."""
source_mean = source_features.mean(dim=0)
target_mean = target_features.mean(dim=0)
return ((source_mean - target_mean) ** 2).sum()

Implementation Strategy

Python Implementation

The Python implementation uses PyTorch for neural networks and provides:

  • TransferFeatureExtractor: Pre-trainable feature extraction network
  • DomainAdaptiveTrader: Trading model with domain adaptation
  • TransferLearningPipeline: End-to-end transfer learning pipeline
  • BacktestEngine: Strategy backtesting with transfer learning

Rust Implementation

The Rust implementation provides high-performance transfer learning:

use transfer_learning_trading::{
TransferNetwork, DomainAdapter, MarketDomain,
FeatureExtractor, TradingStrategy,
};
// Create a transfer learning network
let network = TransferNetwork::new(
20, // input dimension
128, // hidden dimension
64, // feature dimension
true, // use domain adaptation
);
// Pre-train on source domain (e.g., major crypto pairs)
let source_domain = MarketDomain::crypto("BTC/USDT", "ETH/USDT");
network.pretrain(&source_data, &source_labels, &pretrain_config);
// Adapt to target domain (e.g., new token)
let target_domain = MarketDomain::crypto("NEW/USDT");
let adapter = DomainAdapter::mmd(network.feature_extractor());
adapter.adapt(&target_data, &adapt_config);
// Generate trading signals
let strategy = TradingStrategy::new(network, adapter);
let signals = strategy.predict(&new_data);

Quick Start

Python:

Terminal window
cd 91_transfer_learning_trading/python
pip install torch numpy pandas scikit-learn
python train.py --source BTC/USDT --target ETH/USDT --method fine_tune
python backtest.py --model saved_model.pt --data target_data.csv

Rust:

Terminal window
cd 91_transfer_learning_trading
cargo run --example basic_transfer
cargo run --example domain_adaptation
cargo run --example bybit_live

Bybit Integration

Real-Time Data Pipeline

use transfer_learning_trading::data::BybitClient;
// Initialize client
let client = BybitClient::new(BybitConfig::default());
// Fetch source domain data (established pairs)
let btc_data = client.fetch_klines("BTCUSDT", "1h", 1000).await?;
let eth_data = client.fetch_klines("ETHUSDT", "1h", 1000).await?;
// Fetch target domain data (newer pairs)
let target_data = client.fetch_klines("NEWUSDT", "1h", 100).await?;
// Pre-train on source, adapt to target
let model = TransferNetwork::new(20, 128, 64, true);
model.pretrain_on_klines(&[btc_data, eth_data], &config);
model.adapt_to_klines(&target_data, &adapt_config);

Supported Endpoints

EndpointDescriptionUse Case
/v5/market/klineHistorical klinesSource/target data
/v5/market/tickersCurrent tickersLive signals
/v5/market/orderbookOrder book depthMicrostructure features
/v5/market/recent-tradeRecent tradesVolume analysis

Feature Engineering from Bybit Data

OHLCV Data → Feature Extraction:
├── Price features: returns, log returns, volatility
├── Volume features: VWAP, volume ratio, OBV
├── Technical indicators: RSI, MACD, Bollinger Bands
├── Microstructure: bid-ask spread, order imbalance
└── Cross-asset: correlation, beta, relative strength

Risk Management

Transfer-Specific Risks

  1. Negative Transfer: When source domain knowledge hurts target performance

    • Mitigation: Monitor target validation loss; stop adaptation if diverging
  2. Domain Shift: When target domain drifts from source over time

    • Mitigation: Continuous adaptation with sliding window
  3. Overfitting to Source: Model too specialized to source domain

    • Mitigation: Regularization, early stopping, domain adversarial training

Risk Controls

┌─────────────────────────────────────────────────────────────────────────┐
│ Transfer Learning Risk Framework │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Pre-Transfer Checks: │
│ ├── Domain similarity score > threshold (0.7) │
│ ├── Source model validation accuracy > minimum (60%) │
│ └── Sufficient target data for validation (>100 samples) │
│ │
│ During Adaptation: │
│ ├── Monitor MMD between source and target features │
│ ├── Track target validation loss (stop if increasing) │
│ ├── Limit fine-tuning epochs (prevent overfitting) │
│ └── Gradient clipping during adaptation │
│ │
│ Post-Transfer Trading: │
│ ├── Maximum position size: 2% of portfolio │
│ ├── Stop-loss: 1.5% per trade │
│ ├── Maximum daily drawdown: 3% │
│ ├── Confidence threshold for signals: 0.65 │
│ └── Reduce position size for low-similarity domains │
│ │
└─────────────────────────────────────────────────────────────────────────┘

Performance Metrics

Model Evaluation

MetricDescriptionTarget
Target AccuracyClassification accuracy on target domain> 55%
Transfer GainImprovement over training from scratch> 5%
A-DistanceDomain divergence measure< 1.5
MMDFeature distribution alignment< 0.1
Adaptation SpeedEpochs to converge on target< 50

Trading Metrics

MetricDescriptionTarget
Sharpe RatioRisk-adjusted return> 1.5
Sortino RatioDownside risk-adjusted return> 2.0
Maximum DrawdownLargest peak-to-trough decline< 15%
Win RatePercentage of profitable trades> 52%
Profit FactorGross profit / Gross loss> 1.3
Calmar RatioAnnual return / Max drawdown> 1.0

Comparison with Traditional Approaches

AspectTrain from ScratchTransfer LearningDomain Adaptation
Data requirementHigh (>1000 samples)Low (>100 samples)Medium (>200 samples)
Training timeHoursMinutes (fine-tune)Minutes-Hours
New market entrySlowFastMedium
Regime adaptationPoorGoodExcellent
Implementation complexityLowMediumHigh
Risk of negative transferN/AMediumLow
Cross-market generalizationNoneGoodExcellent

Project Structure

91_transfer_learning_trading/
├── README.md # This file
├── README.ru.md # Russian translation
├── readme.simple.md # Simplified English explanation
├── readme.simple.ru.md # Simplified Russian explanation
├── README.specify.md # Technical specification
├── Cargo.toml # Rust project manifest
├── src/
│ ├── lib.rs # Library root
│ ├── network/
│ │ ├── mod.rs # Network module
│ │ ├── feature_extractor.rs # Feature extraction layers
│ │ ├── domain_adapter.rs # Domain adaptation methods
│ │ └── transfer.rs # Transfer learning network
│ ├── data/
│ │ ├── mod.rs # Data module
│ │ ├── features.rs # Feature engineering
│ │ ├── bybit.rs # Bybit API integration
│ │ └── stock.rs # Stock data loader
│ ├── strategy/
│ │ ├── mod.rs # Strategy module
│ │ └── transfer_strategy.rs # Transfer-based trading strategy
│ ├── training/
│ │ ├── mod.rs # Training module
│ │ └── trainer.rs # Transfer learning trainer
│ └── utils/
│ ├── mod.rs # Utils module
│ └── metrics.rs # Performance metrics
├── examples/
│ ├── basic_transfer.rs # Basic transfer learning example
│ ├── domain_adaptation.rs # Domain adaptation example
│ └── bybit_live.rs # Live Bybit trading example
├── tests/
│ └── integration_tests.rs # Integration tests
└── python/
├── model.py # PyTorch model definitions
├── train.py # Training script
└── backtest.py # Backtesting script

References

  1. A Survey on Transfer Learning - Pan, S.J. & Yang, Q. (2010). IEEE Transactions on Knowledge and Data Engineering.

  2. Transfer Learning for Financial Time Series - arXiv:2102.09873 (2021)

  3. Domain Adaptation for Financial Trading - Learning to adapt across financial domains using adversarial methods.

  4. How transferable are features in deep neural networks? - Yosinski, J. et al. (2014). NeurIPS.

  5. Deep Domain Confusion - Tzeng, E. et al. (2014). Maximizing domain confusion for domain adaptation.

  6. CORAL: Correlation Alignment for Domain Adaptation - Sun, B. & Saenko, K. (2016). ECCV.