Chapter 92: Domain Adaptation for Finance

Overview

Domain adaptation is a branch of transfer learning that addresses the problem of applying a model trained on one data distribution (the source domain) to a different but related data distribution (the target domain) where little or no labeled data is available. In financial markets, domain shifts are pervasive: a model trained on US equities may perform poorly on cryptocurrency markets, a strategy developed during low-volatility regimes may fail during crises, and signals calibrated on one exchange may not transfer to another. Domain adaptation provides principled methods to bridge these distributional gaps without requiring expensive re-labeling or retraining from scratch.

For quantitative traders and portfolio managers, domain adaptation offers a powerful toolkit to extend the life and reach of predictive models. Rather than building separate models for every market, asset class, and regime, domain adaptation techniques such as Domain-Adversarial Neural Networks (DANN), Maximum Mean Discrepancy (MMD), and Correlation Alignment (CORAL) enable models to learn representations that are invariant across domains while still preserving discriminative power for the task at hand. This chapter provides a rigorous mathematical foundation, practical implementations in both Python and Rust, and end-to-end examples using real stock and cryptocurrency (Bybit) data.

Introduction to Domain Adaptation
Mathematical Foundation
Domain Adaptation vs Other Transfer Methods
Applications in Trading
Implementation in Python
Implementation in Rust
Practical Examples with Stock and Crypto Data
Backtesting Framework
Performance Evaluation
Future Directions

Introduction to Domain Adaptation

What is Domain Adaptation?

Domain adaptation is a subfield of machine learning concerned with the scenario where the training data (source domain) and test data (target domain) come from different probability distributions. Formally, a domain is defined by a feature space $\mathcal{X}$, a label space $\mathcal{Y}$, and a joint distribution $P(X, Y)$ over $\mathcal{X} \times \mathcal{Y}$. Domain adaptation applies when:

The source domain $\mathcal{D}S = {(x_i^s, y_i^s)}{i=1}^{n_s}$ has abundant labeled data
The target domain $\mathcal{D}T = {x_j^t}{j=1}^{n_t}$ has unlabeled or sparsely labeled data
The marginal distributions differ: $P_S(X) \neq P_T(X)$

The goal is to learn a hypothesis $h: \mathcal{X} \rightarrow \mathcal{Y}$ that performs well on the target domain despite being trained primarily on source domain data.

There are three main categories of domain adaptation:

Category	Target Labels	Example
Unsupervised DA	No labeled target data	Stock model applied to crypto
Semi-supervised DA	Few labeled target samples	Limited labeled crypto data
Supervised DA	Labeled target data (small)	Small labeled crypto dataset

The Domain Shift Problem in Finance

Financial markets exhibit several types of domain shifts that degrade model performance:

Types of Domain Shift in Finance:
├── Covariate Shift
│   ├── Different return distributions (fat tails, skewness)
│   ├── Different volatility regimes (VIX 12 vs VIX 40)
│   └── Different liquidity conditions
├── Concept Drift
│   ├── Changing alpha factors over time
│   ├── Regime transitions (bull → bear)
│   └── Structural market changes (regulation, technology)
├── Prior Probability Shift
│   ├── Different class ratios (more up-days in bull markets)
│   └── Different event frequencies (rare crashes)
└── Dataset Shift
    ├── Cross-market differences (NYSE vs Bybit)
    ├── Cross-asset differences (equities vs crypto)
    └── Cross-frequency differences (daily vs tick)

Concrete example: A model trained to predict next-day returns for S&P 500 stocks using features like RSI, MACD, and volume ratios may fail on Bitcoin because:

Crypto markets trade 24/7 (no overnight gaps)
Volatility is 3-5x higher
Mean-reversion patterns differ from momentum patterns
Order book dynamics and market microstructure differ
Correlation structure with macro factors is different

Why Domain Adaptation for Trading?

Domain adaptation provides several key advantages for quantitative trading:

Data Efficiency: New markets (e.g., DeFi tokens) have limited history. Adapting a model trained on decades of equity data is more data-efficient than training from scratch.
Regime Robustness: Markets undergo regime changes. A model adapted to the new regime outperforms one trained solely on historical data from a different regime.
Cross-Market Alpha: Patterns discovered in liquid, well-studied markets (US equities) can be transferred to less efficient markets (emerging market crypto) where the same patterns may offer larger risk premia.
Reduced Overfitting: By learning domain-invariant representations, models are less likely to overfit to source-domain-specific noise.
Faster Deployment: Adapting an existing model to a new exchange or asset class is significantly faster than building from scratch.

Mathematical Foundation

Domain Adaptation Theory

The theoretical foundation of domain adaptation rests on bounding the target domain error in terms of quantities that can be estimated or controlled.

Definition (Domain): A domain $\mathcal{D}$ consists of a distribution $\mathcal{D}_X$ on input space $\mathcal{X}$ and a labeling function $f: \mathcal{X} \rightarrow [0,1]$.

Definition ($\mathcal{H}$-divergence): For a hypothesis class $\mathcal{H}$, the $\mathcal{H}$-divergence between source distribution $\mathcal{D}_S$ and target distribution $\mathcal{D}_T$ is:

$$d_{\mathcal{H}}(\mathcal{D}S, \mathcal{D}T) = 2 \sup{h \in \mathcal{H}} \left| \Pr{\mathcal{D}S}[h(x) = 1] - \Pr{\mathcal{D}_T}[h(x) = 1] \right|$$

Ben-David’s Bound on Target Error

The foundational result by Ben-David et al. (2010) provides an upper bound on target domain error:

$$\epsilon_T(h) \leq \epsilon_S(h) + \frac{1}{2} d_{\mathcal{H}\Delta\mathcal{H}}(\mathcal{D}_S, \mathcal{D}_T) + \lambda^*$$

Where:

$\epsilon_T(h)$ = error of hypothesis $h$ on target domain
$\epsilon_S(h)$ = error of hypothesis $h$ on source domain
$d_{\mathcal{H}\Delta\mathcal{H}}$ = symmetric hypothesis divergence between domains
$\lambda^* = \min_{h \in \mathcal{H}} [\epsilon_S(h) + \epsilon_T(h)]$ = combined error of the ideal joint hypothesis

Implications for trading:

Minimizing $\epsilon_S(h)$: Train a good model on the source market
Minimizing $d_{\mathcal{H}\Delta\mathcal{H}}$: Learn features where the two markets look similar
Small $\lambda^*$: There must exist a hypothesis that works well on both markets (assumption)

Maximum Mean Discrepancy (MMD)

MMD is a kernel-based distance measure between probability distributions. Given two distributions $P$ and $Q$, the MMD is defined as:

$$\text{MMD}^2(\mathcal{F}, P, Q) = \sup_{f \in \mathcal{F}} \left( \mathbb{E}{x \sim P}[f(x)] - \mathbb{E}{y \sim Q}[f(y)] \right)^2$$

When $\mathcal{F}$ is the unit ball in a reproducing kernel Hilbert space (RKHS) $\mathcal{H}_k$ with kernel $k$, the squared MMD has a closed-form empirical estimate:

$$\widehat{\text{MMD}}^2 = \frac{1}{n_s^2} \sum_{i,j} k(x_i^s, x_j^s) + \frac{1}{n_t^2} \sum_{i,j} k(x_i^t, x_j^t) - \frac{2}{n_s n_t} \sum_{i,j} k(x_i^s, x_j^t)$$

Common kernel choices:

Kernel	Formula	Use Case
Gaussian (RBF)	$k(x,y) = \exp\left(-\frac{\|x-y\|^2}{2\sigma^2}\right)$	General purpose
Laplacian	$k(x,y) = \exp\left(-\frac{\|x-y\|_1}{\sigma}\right)$	Heavy-tailed distributions
Multi-scale	$\sum_i k_{\sigma_i}(x,y)$	Robust across bandwidths

For trading applications, the multi-scale Gaussian kernel with bandwidths $\sigma \in {0.1, 1.0, 10.0}$ is recommended because financial feature distributions can have very different scales.

Domain-Adversarial Neural Networks (DANN)

DANN, introduced by Ganin et al. (2016), learns domain-invariant representations through adversarial training. The architecture consists of three components:

Input Features
      │
      ▼
┌──────────────┐
│   Feature    │
│  Extractor   │  G_f(x; θ_f) - shared parameters
│   (θ_f)      │
└──────┬───────┘
       │
       ├──────────────────────┐
       ▼                      ▼
┌──────────────┐    ┌──────────────────┐
│    Label     │    │     Domain       │
│  Predictor   │    │   Classifier     │
│   (θ_y)      │    │     (θ_d)        │
└──────────────┘    └──────────────────┘
  G_y(·; θ_y)       G_d(·; θ_d)
  Minimize           ← Gradient Reversal Layer (GRL)
  label loss         Maximize domain confusion

The key innovation is the Gradient Reversal Layer (GRL), which multiplies the gradient by $-\lambda$ during backpropagation:

$$\text{GRL}(x) = x \quad \text{(forward pass)}$$ $$\frac{\partial \text{GRL}}{\partial x} = -\lambda I \quad \text{(backward pass)}$$

The overall training objective is a minimax game:

$$\min_{\theta_f, \theta_y} \max_{\theta_d} ; \mathcal{L}(\theta_f, \theta_y, \theta_d) = \mathcal{L}_y(\theta_f, \theta_y) - \lambda \mathcal{L}_d(\theta_f, \theta_d)$$

Where:

$\mathcal{L}y = -\frac{1}{n_s}\sum{i=1}^{n_s} \sum_{c=1}^{C} y_{ic} \log G_y(G_f(x_i^s))$ is the classification loss on source data
$\mathcal{L}d = -\frac{1}{n_s+n_t}\sum{i} [d_i \log G_d(G_f(x_i)) + (1-d_i)\log(1-G_d(G_f(x_i)))]$ is the domain classification loss
$\lambda$ is the adaptation trade-off parameter (often scheduled: $\lambda_p = \frac{2}{1+\exp(-\gamma p)} - 1$)

Correlation Alignment (CORAL)

CORAL aligns the second-order statistics (covariance matrices) of source and target features. Given source features $D_S \in \mathbb{R}^{n_s \times d}$ and target features $D_T \in \mathbb{R}^{n_t \times d}$, the CORAL loss is:

$$\mathcal{L}_{\text{CORAL}} = \frac{1}{4d^2} | C_S - C_T |_F^2$$

Where:

$C_S = \frac{1}{n_s - 1}(D_S^T D_S - \frac{1}{n_s}(\mathbf{1}^T D_S)^T (\mathbf{1}^T D_S))$ is the source covariance matrix
$C_T$ is computed analogously for the target domain
$|\cdot|_F$ is the Frobenius norm

Deep CORAL extends this to neural network features by adding the CORAL loss as a regularization term:

$$\mathcal{L} = \mathcal{L}{\text{task}} + \alpha \cdot \mathcal{L}{\text{CORAL}}$$

Advantages for finance: CORAL is computationally lightweight, does not require adversarial training, and is particularly effective when the domain shift primarily affects the correlation structure of features (common in financial data where correlations change across regimes).

Optimal Transport for Domain Adaptation

Optimal Transport (OT) provides a geometrically meaningful way to measure and minimize the distance between source and target distributions. The Wasserstein distance (Earth Mover’s Distance) is defined as:

$$W_p(P_S, P_T) = \left( \inf_{\gamma \in \Pi(P_S, P_T)} \int_{\mathcal{X} \times \mathcal{X}} |x - y|^p , d\gamma(x, y) \right)^{1/p}$$

Where $\Pi(P_S, P_T)$ is the set of all joint distributions with marginals $P_S$ and $P_T$.

The discrete formulation for empirical distributions solves:

$$\min_{\gamma \in \mathbb{R}^{n_s \times n_t}_+} \langle C, \gamma \rangle_F \quad \text{s.t.} \quad \gamma \mathbf{1} = \mu_S, \quad \gamma^T \mathbf{1} = \mu_T$$

Where $C_{ij} = |x_i^s - x_j^t|^2$ is the cost matrix. The Sinkhorn algorithm provides efficient computation via entropic regularization:

$$W_\epsilon(P_S, P_T) = \min_{\gamma \in \Pi} \langle C, \gamma \rangle + \epsilon H(\gamma)$$

Where $H(\gamma) = -\sum_{ij} \gamma_{ij} \log \gamma_{ij}$ is the entropy regularizer.

For domain adaptation, OT provides a transport plan $\gamma^*$ that maps source samples to their target counterparts, enabling direct feature alignment.

Domain Adaptation vs Other Transfer Methods

Understanding how domain adaptation relates to and differs from other transfer learning approaches is essential for choosing the right technique:

Method	Source Labels	Target Labels	Shared Architecture	Key Mechanism	Best For
Domain Adaptation	Yes	No	Feature extractor	Distribution alignment (DANN, MMD, CORAL)	Cross-market transfer
Transfer Learning	Yes	Yes (small)	Pretrained backbone	Feature reuse, frozen layers	New asset classes
Fine-Tuning	Yes	Yes (small)	Full model	Gradual parameter update	Market regime change
Multi-Task Learning	Yes	Yes	Shared trunk	Joint optimization	Simultaneous markets
Zero-Shot Learning	Yes	No	Full model	Semantic descriptions	Unseen instruments
Few-Shot Learning	Yes	Yes (very few)	Metric space	Distance-based classification	New exchanges

When to use domain adaptation over alternatives:

Decision Tree for Transfer Method Selection:
│
├── Do you have labeled target data?
│   ├── Yes (abundant) → Standard supervised learning
│   ├── Yes (small) → Fine-tuning or few-shot learning
│   └── No → Domain Adaptation ★
│       ├── Is the shift primarily in feature distributions?
│       │   ├── Yes → CORAL or MMD
│       │   └── No → DANN or Optimal Transport
│       └── Is there a clear correspondence between domains?
│           ├── Yes → Optimal Transport
│           └── No → DANN with adversarial alignment

Key distinctions in financial context:

Domain Adaptation: Adapt a US equity model to trade Bybit crypto without labeling crypto data
Transfer Learning: Use a pre-trained equity model as initialization, then fine-tune on small crypto dataset
Multi-Task Learning: Simultaneously learn to predict returns for both stocks and crypto
Zero-Shot: Predict returns for a newly listed token that has no historical data

Applications in Trading

1. Cross-Market Adaptation (Stocks to Crypto)

The most natural application of domain adaptation in trading is transferring models between traditional and cryptocurrency markets.

Source Domain (Equities)          Target Domain (Crypto/Bybit)
─────────────────────────         ──────────────────────────────
• 30+ years of daily data         • 5-10 years of data
• Well-understood factors         • Emerging factor structure
• Regular trading hours           • 24/7 trading
• Lower volatility                • Higher volatility
• Regulated markets               • Less regulated
• Established microstructure      • Evolving microstructure

Shared features that transfer well:

Price momentum (various lookback periods)
Volatility measures (realized, ATR, Parkinson)
Volume-based indicators (OBV, VWAP deviation)
Mean-reversion signals (RSI, Bollinger Band width)
Autocorrelation structure

Features that require adaptation:

Absolute return magnitudes (scale differently)
Correlation with macro factors (different regimes)
Intraday patterns (no market close in crypto)
Liquidity measures (different order book depth)

2. Temporal Domain Adaptation (Regime Changes)

Markets undergo regime changes that create temporal domain shifts:

Regime Transition	Source Period	Target Period	Key Shift
Pre/Post COVID	2015-2019	2020-2021	Volatility, correlation
QE to QT	2020-2021	2022-2023	Interest rate sensitivity
Low to High Vol	VIX < 15	VIX > 30	Risk dynamics
Bull to Bear	2020-2021	2022	Return distribution
Pre/Post Halving	Before halving	After halving	Crypto supply dynamics

Temporal domain adaptation treats the historical data as the source domain and the current regime as the target domain, adapting feature representations to account for structural changes.

3. Cross-Exchange Adaptation

Different exchanges have distinct characteristics even for the same asset:

Exchange A (e.g., Binance)       Exchange B (e.g., Bybit)
──────────────────────────       ──────────────────────────
• Different fee structures       • Different fee structures
• Different order types          • Different order types
• Different maker/taker ratio    • Different maker/taker ratio
• Different user demographics    • Different user demographics
• Different liquidity depth      • Different liquidity depth
• Different funding rates        • Different funding rates

A model trained on order flow features from one exchange can be adapted to another without re-labeling, preserving the learned relationships while adjusting for exchange-specific distributional differences.

4. Cross-Asset Class Adaptation

Domain adaptation enables transferring strategies across related asset classes:

Equity to Fixed Income: Momentum and mean-reversion patterns
Commodity to Crypto: Scarcity-driven pricing models
FX to Crypto: Pairs trading and carry strategies
Equity Index to Single Stock: Factor model adaptation

Implementation in Python

Project Setup

torch>=2.0.0
numpy>=1.24.0
pandas>=2.0.0
scikit-learn>=1.3.0
yfinance>=0.2.18
requests>=2.31.0
matplotlib>=3.7.0
seaborn>=0.12.0
scipy>=1.11.0
pot>=0.9.0  # Python Optimal Transport

Data Loading: Bybit and Stock Data

import numpy as np
import pandas as pd
import requests
from datetime import datetime, timedelta
from typing import Optional, Tuple, List

class BybitDataClient:
    """Client for fetching historical kline data from Bybit API v5."""

    BASE_URL = "https://api.bybit.com"

    def __init__(self):
        self.session = requests.Session()

    def get_klines(
        self,
        symbol: str = "BTCUSDT",
        interval: str = "D",
        limit: int = 1000,
        start_time: Optional[int] = None,
        end_time: Optional[int] = None,
    ) -> pd.DataFrame:
        """Fetch kline/candlestick data from Bybit."""
        endpoint = f"{self.BASE_URL}/v5/market/kline"
        params = {
            "category": "linear",
            "symbol": symbol,
            "interval": interval,
            "limit": min(limit, 1000),
        }
        if start_time:
            params["start"] = start_time
        if end_time:
            params["end"] = end_time

        response = self.session.get(endpoint, params=params)
        data = response.json()

        if data["retCode"] != 0:
            raise ValueError(f"Bybit API error: {data['retMsg']}")

        records = data["result"]["list"]
        df = pd.DataFrame(records, columns=[
            "timestamp", "open", "high", "low", "close", "volume", "turnover"
        ])
        for col in ["open", "high", "low", "close", "volume", "turnover"]:
            df[col] = df[col].astype(float)
        df["timestamp"] = pd.to_datetime(df["timestamp"].astype(int), unit="ms")
        df = df.sort_values("timestamp").reset_index(drop=True)
        return df

    def get_multiple_symbols(
        self, symbols: List[str], interval: str = "D", limit: int = 1000
    ) -> dict:
        """Fetch kline data for multiple symbols."""
        data = {}
        for symbol in symbols:
            try:
                data[symbol] = self.get_klines(symbol, interval, limit)
            except Exception as e:
                print(f"Error fetching {symbol}: {e}")
        return data


def load_stock_data(
    tickers: List[str],
    start: str = "2020-01-01",
    end: str = "2024-01-01"
) -> pd.DataFrame:
    """Load stock data using yfinance."""
    import yfinance as yf
    data = yf.download(tickers, start=start, end=end)
    return data

Feature Engineering

class FeatureEngineer:
    """Generate domain-agnostic features for trading models."""

    def __init__(self, lookback: int = 20):
        self.lookback = lookback

    def compute_features(self, df: pd.DataFrame) -> pd.DataFrame:
        """Compute technical features that are comparable across domains."""
        features = pd.DataFrame(index=df.index)

        close = df["close"]
        high = df["high"]
        low = df["low"]
        volume = df["volume"]

        # --- Normalized returns (z-scored) ---
        returns = close.pct_change()
        features["return_1d"] = returns
        features["return_5d"] = close.pct_change(5)
        features["return_20d"] = close.pct_change(self.lookback)

        # --- Volatility measures (annualized, normalized) ---
        features["realized_vol"] = returns.rolling(self.lookback).std() * np.sqrt(252)
        features["vol_ratio"] = (
            returns.rolling(5).std() / returns.rolling(self.lookback).std()
        )

        # --- Parkinson volatility ---
        log_hl = np.log(high / low)
        features["parkinson_vol"] = np.sqrt(
            (1 / (4 * np.log(2))) * (log_hl ** 2).rolling(self.lookback).mean()
        ) * np.sqrt(252)

        # --- RSI (already normalized 0-100) ---
        delta = close.diff()
        gain = delta.where(delta > 0, 0.0).rolling(14).mean()
        loss = (-delta.where(delta < 0, 0.0)).rolling(14).mean()
        rs = gain / loss.replace(0, np.nan)
        features["rsi"] = 100 - (100 / (1 + rs))

        # --- Normalized MACD ---
        ema12 = close.ewm(span=12).mean()
        ema26 = close.ewm(span=26).mean()
        macd = ema12 - ema26
        features["macd_norm"] = macd / close  # Price-normalized

        # --- Bollinger Band Width (normalized) ---
        sma = close.rolling(self.lookback).mean()
        std = close.rolling(self.lookback).std()
        features["bb_width"] = (2 * std) / sma

        # --- Volume features (normalized) ---
        features["volume_ratio"] = volume / volume.rolling(self.lookback).mean()
        features["obv_slope"] = (
            (np.sign(returns) * volume).cumsum()
            .rolling(self.lookback)
            .apply(lambda x: np.polyfit(range(len(x)), x, 1)[0])
        )

        # --- Autocorrelation ---
        features["autocorr_5"] = returns.rolling(60).apply(
            lambda x: x.autocorr(lag=5), raw=False
        )

        # --- Mean reversion signal ---
        features["mean_reversion"] = (close - sma) / std

        return features.dropna()

    def normalize_features(
        self,
        source_features: pd.DataFrame,
        target_features: pd.DataFrame
    ) -> Tuple[pd.DataFrame, pd.DataFrame]:
        """Z-score normalize using source statistics."""
        mean = source_features.mean()
        std = source_features.std()
        source_norm = (source_features - mean) / std
        target_norm = (target_features - mean) / std
        return source_norm, target_norm

DANN Implementation (PyTorch)

import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Function
from torch.utils.data import DataLoader, TensorDataset


class GradientReversalFunction(Function):
    """Gradient Reversal Layer (GRL) from Ganin et al., 2016.

    Forward pass: identity
    Backward pass: negate gradients and scale by lambda
    """

    @staticmethod
    def forward(ctx, x, lambda_val):
        ctx.lambda_val = lambda_val
        return x.clone()

    @staticmethod
    def backward(ctx, grad_output):
        return -ctx.lambda_val * grad_output, None


class GradientReversalLayer(nn.Module):
    def __init__(self, lambda_val=1.0):
        super().__init__()
        self.lambda_val = lambda_val

    def forward(self, x):
        return GradientReversalFunction.apply(x, self.lambda_val)


class FeatureExtractor(nn.Module):
    """Shared feature extractor network G_f."""

    def __init__(self, input_dim: int, hidden_dim: int = 128, feature_dim: int = 64):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.BatchNorm1d(hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(hidden_dim, hidden_dim),
            nn.BatchNorm1d(hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(hidden_dim, feature_dim),
            nn.ReLU(),
        )

    def forward(self, x):
        return self.network(x)


class LabelPredictor(nn.Module):
    """Task-specific label predictor G_y."""

    def __init__(self, feature_dim: int = 64, num_classes: int = 3):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(feature_dim, 32),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(32, num_classes),
        )

    def forward(self, features):
        return self.network(features)


class DomainClassifier(nn.Module):
    """Domain classifier G_d with gradient reversal."""

    def __init__(self, feature_dim: int = 64, lambda_val: float = 1.0):
        super().__init__()
        self.grl = GradientReversalLayer(lambda_val)
        self.network = nn.Sequential(
            nn.Linear(feature_dim, 32),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(32, 2),  # Binary: source vs target
        )

    def forward(self, features):
        reversed_features = self.grl(features)
        return self.network(reversed_features)


class DANN(nn.Module):
    """Complete Domain-Adversarial Neural Network.

    Architecture:
        Input → FeatureExtractor → LabelPredictor (source labels)
                        ↓
                  GRL → DomainClassifier (domain labels)
    """

    def __init__(
        self,
        input_dim: int,
        hidden_dim: int = 128,
        feature_dim: int = 64,
        num_classes: int = 3,
        lambda_val: float = 1.0,
    ):
        super().__init__()
        self.feature_extractor = FeatureExtractor(input_dim, hidden_dim, feature_dim)
        self.label_predictor = LabelPredictor(feature_dim, num_classes)
        self.domain_classifier = DomainClassifier(feature_dim, lambda_val)

    def forward(self, x, alpha=1.0):
        features = self.feature_extractor(x)
        class_output = self.label_predictor(features)
        self.domain_classifier.grl.lambda_val = alpha
        domain_output = self.domain_classifier(features)
        return class_output, domain_output

    def predict(self, x):
        """Predict labels using only the feature extractor and label predictor."""
        self.eval()
        with torch.no_grad():
            features = self.feature_extractor(x)
            logits = self.label_predictor(features)
            return torch.argmax(logits, dim=1)


class DANNTrainer:
    """Trainer for the Domain-Adversarial Neural Network."""

    def __init__(
        self,
        model: DANN,
        lr: float = 1e-3,
        gamma: float = 10.0,
        epochs: int = 200,
        batch_size: int = 64,
        device: str = "cpu",
    ):
        self.model = model.to(device)
        self.device = device
        self.epochs = epochs
        self.batch_size = batch_size
        self.gamma = gamma
        self.optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=1e-4)
        self.label_criterion = nn.CrossEntropyLoss()
        self.domain_criterion = nn.CrossEntropyLoss()

    def _schedule_lambda(self, epoch: int) -> float:
        """Progressive lambda scheduling from Ganin et al."""
        p = epoch / self.epochs
        return 2.0 / (1.0 + np.exp(-self.gamma * p)) - 1.0

    def train(
        self,
        source_features: np.ndarray,
        source_labels: np.ndarray,
        target_features: np.ndarray,
    ) -> dict:
        """Train DANN with source labels and unlabeled target data."""
        # Prepare data loaders
        source_dataset = TensorDataset(
            torch.FloatTensor(source_features),
            torch.LongTensor(source_labels),
        )
        target_dataset = TensorDataset(torch.FloatTensor(target_features))

        source_loader = DataLoader(
            source_dataset, batch_size=self.batch_size, shuffle=True, drop_last=True
        )
        target_loader = DataLoader(
            target_dataset, batch_size=self.batch_size, shuffle=True, drop_last=True
        )

        history = {"label_loss": [], "domain_loss": [], "total_loss": []}

        for epoch in range(self.epochs):
            self.model.train()
            alpha = self._schedule_lambda(epoch)

            epoch_label_loss = 0.0
            epoch_domain_loss = 0.0
            n_batches = 0

            target_iter = iter(target_loader)

            for source_batch, source_label_batch in source_loader:
                try:
                    (target_batch,) = next(target_iter)
                except StopIteration:
                    target_iter = iter(target_loader)
                    (target_batch,) = next(target_iter)

                source_batch = source_batch.to(self.device)
                source_label_batch = source_label_batch.to(self.device)
                target_batch = target_batch.to(self.device)

                # Source domain: label = 0
                source_domain_labels = torch.zeros(
                    source_batch.size(0), dtype=torch.long
                ).to(self.device)
                # Target domain: label = 1
                target_domain_labels = torch.ones(
                    target_batch.size(0), dtype=torch.long
                ).to(self.device)

                # Forward pass on source
                class_output, source_domain_output = self.model(source_batch, alpha)
                label_loss = self.label_criterion(class_output, source_label_batch)
                source_domain_loss = self.domain_criterion(
                    source_domain_output, source_domain_labels
                )

                # Forward pass on target (domain classification only)
                _, target_domain_output = self.model(target_batch, alpha)
                target_domain_loss = self.domain_criterion(
                    target_domain_output, target_domain_labels
                )

                # Combined loss
                domain_loss = source_domain_loss + target_domain_loss
                total_loss = label_loss + domain_loss

                self.optimizer.zero_grad()
                total_loss.backward()
                self.optimizer.step()

                epoch_label_loss += label_loss.item()
                epoch_domain_loss += domain_loss.item()
                n_batches += 1

            avg_label = epoch_label_loss / max(n_batches, 1)
            avg_domain = epoch_domain_loss / max(n_batches, 1)
            history["label_loss"].append(avg_label)
            history["domain_loss"].append(avg_domain)
            history["total_loss"].append(avg_label + avg_domain)

            if (epoch + 1) % 20 == 0:
                print(
                    f"Epoch {epoch+1}/{self.epochs} | "
                    f"Label Loss: {avg_label:.4f} | "
                    f"Domain Loss: {avg_domain:.4f} | "
                    f"Lambda: {alpha:.4f}"
                )

        return history

MMD Implementation

class MMDLoss(nn.Module):
    """Maximum Mean Discrepancy loss with multi-kernel support.

    MMD^2 = E[k(xs, xs')] + E[k(xt, xt')] - 2*E[k(xs, xt)]
    """

    def __init__(self, kernel_type: str = "rbf", bandwidth_list: list = None):
        super().__init__()
        self.kernel_type = kernel_type
        self.bandwidth_list = bandwidth_list or [0.1, 0.5, 1.0, 5.0, 10.0]

    def _gaussian_kernel(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
        """Multi-scale Gaussian (RBF) kernel."""
        xx = torch.mm(x, x.t())
        yy = torch.mm(y, y.t())
        xy = torch.mm(x, y.t())

        rx = xx.diag().unsqueeze(0).expand_as(xx)
        ry = yy.diag().unsqueeze(0).expand_as(yy)

        dxx = rx.t() + rx - 2.0 * xx
        dyy = ry.t() + ry - 2.0 * yy
        dxy = rx.t() + ry - 2.0 * xy

        XX, YY, XY = (
            torch.zeros_like(xx),
            torch.zeros_like(yy),
            torch.zeros_like(xy),
        )

        for bw in self.bandwidth_list:
            XX += torch.exp(-0.5 * dxx / bw)
            YY += torch.exp(-0.5 * dyy / bw)
            XY += torch.exp(-0.5 * dxy / bw)

        return XX, YY, XY

    def forward(self, source: torch.Tensor, target: torch.Tensor) -> torch.Tensor:
        """Compute MMD^2 between source and target."""
        n_s = source.size(0)
        n_t = target.size(0)

        XX, YY, XY = self._gaussian_kernel(source, target)

        mmd_squared = (
            XX.sum() / (n_s * n_s)
            + YY.sum() / (n_t * n_t)
            - 2.0 * XY.sum() / (n_s * n_t)
        )
        return mmd_squared


class MMDAdaptationModel(nn.Module):
    """Neural network with MMD regularization for domain adaptation."""

    def __init__(
        self,
        input_dim: int,
        hidden_dim: int = 128,
        feature_dim: int = 64,
        num_classes: int = 3,
        mmd_weight: float = 1.0,
    ):
        super().__init__()
        self.feature_extractor = FeatureExtractor(input_dim, hidden_dim, feature_dim)
        self.classifier = LabelPredictor(feature_dim, num_classes)
        self.mmd_loss = MMDLoss()
        self.mmd_weight = mmd_weight

    def forward(self, source_x, target_x):
        source_features = self.feature_extractor(source_x)
        target_features = self.feature_extractor(target_x)
        class_output = self.classifier(source_features)
        mmd = self.mmd_loss(source_features, target_features)
        return class_output, mmd

    def predict(self, x):
        self.eval()
        with torch.no_grad():
            features = self.feature_extractor(x)
            logits = self.classifier(features)
            return torch.argmax(logits, dim=1)


def train_mmd_model(
    model: MMDAdaptationModel,
    source_features: np.ndarray,
    source_labels: np.ndarray,
    target_features: np.ndarray,
    epochs: int = 200,
    lr: float = 1e-3,
    batch_size: int = 64,
) -> dict:
    """Train model with MMD domain adaptation."""
    optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=1e-4)
    criterion = nn.CrossEntropyLoss()

    source_dataset = TensorDataset(
        torch.FloatTensor(source_features), torch.LongTensor(source_labels)
    )
    target_dataset = TensorDataset(torch.FloatTensor(target_features))

    source_loader = DataLoader(source_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
    target_loader = DataLoader(target_dataset, batch_size=batch_size, shuffle=True, drop_last=True)

    history = {"cls_loss": [], "mmd_loss": [], "total_loss": []}

    for epoch in range(epochs):
        model.train()
        target_iter = iter(target_loader)
        epoch_cls, epoch_mmd, n_batches = 0.0, 0.0, 0

        for source_batch, label_batch in source_loader:
            try:
                (target_batch,) = next(target_iter)
            except StopIteration:
                target_iter = iter(target_loader)
                (target_batch,) = next(target_iter)

            class_output, mmd = model(source_batch, target_batch)
            cls_loss = criterion(class_output, label_batch)
            total_loss = cls_loss + model.mmd_weight * mmd

            optimizer.zero_grad()
            total_loss.backward()
            optimizer.step()

            epoch_cls += cls_loss.item()
            epoch_mmd += mmd.item()
            n_batches += 1

        history["cls_loss"].append(epoch_cls / max(n_batches, 1))
        history["mmd_loss"].append(epoch_mmd / max(n_batches, 1))
        history["total_loss"].append(
            (epoch_cls + epoch_mmd) / max(n_batches, 1)
        )

        if (epoch + 1) % 50 == 0:
            print(
                f"Epoch {epoch+1}/{epochs} | "
                f"Cls: {history['cls_loss'][-1]:.4f} | "
                f"MMD: {history['mmd_loss'][-1]:.6f}"
            )

    return history

CORAL Implementation

class CORALLoss(nn.Module):
    """Correlation Alignment (CORAL) loss.

    Minimizes the distance between second-order statistics
    (covariance matrices) of source and target features.

    L_coral = (1 / 4d^2) * ||C_S - C_T||_F^2
    """

    def forward(self, source: torch.Tensor, target: torch.Tensor) -> torch.Tensor:
        d = source.size(1)
        n_s = source.size(0)
        n_t = target.size(0)

        # Compute covariance matrices
        source_centered = source - source.mean(dim=0, keepdim=True)
        target_centered = target - target.mean(dim=0, keepdim=True)

        cov_source = (source_centered.t() @ source_centered) / (n_s - 1)
        cov_target = (target_centered.t() @ target_centered) / (n_t - 1)

        # Frobenius norm of difference
        coral_loss = (cov_source - cov_target).pow(2).sum() / (4 * d * d)
        return coral_loss


class DeepCORALModel(nn.Module):
    """Deep CORAL: Deep CORrelation ALignment for domain adaptation.

    Combines task loss with CORAL regularization on deep features.
    """

    def __init__(
        self,
        input_dim: int,
        hidden_dim: int = 128,
        feature_dim: int = 64,
        num_classes: int = 3,
        coral_weight: float = 1.0,
    ):
        super().__init__()
        self.feature_extractor = FeatureExtractor(input_dim, hidden_dim, feature_dim)
        self.classifier = LabelPredictor(feature_dim, num_classes)
        self.coral_loss = CORALLoss()
        self.coral_weight = coral_weight

    def forward(self, source_x, target_x):
        source_features = self.feature_extractor(source_x)
        target_features = self.feature_extractor(target_x)

        class_output = self.classifier(source_features)
        coral = self.coral_loss(source_features, target_features)

        return class_output, coral

    def predict(self, x):
        self.eval()
        with torch.no_grad():
            features = self.feature_extractor(x)
            logits = self.classifier(features)
            return torch.argmax(logits, dim=1)


def train_deep_coral(
    model: DeepCORALModel,
    source_features: np.ndarray,
    source_labels: np.ndarray,
    target_features: np.ndarray,
    epochs: int = 200,
    lr: float = 1e-3,
    batch_size: int = 64,
) -> dict:
    """Train Deep CORAL model."""
    optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=1e-4)
    criterion = nn.CrossEntropyLoss()

    source_dataset = TensorDataset(
        torch.FloatTensor(source_features), torch.LongTensor(source_labels)
    )
    target_dataset = TensorDataset(torch.FloatTensor(target_features))

    source_loader = DataLoader(source_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
    target_loader = DataLoader(target_dataset, batch_size=batch_size, shuffle=True, drop_last=True)

    history = {"cls_loss": [], "coral_loss": [], "total_loss": []}

    for epoch in range(epochs):
        model.train()
        target_iter = iter(target_loader)
        epoch_cls, epoch_coral, n_batches = 0.0, 0.0, 0

        for source_batch, label_batch in source_loader:
            try:
                (target_batch,) = next(target_iter)
            except StopIteration:
                target_iter = iter(target_loader)
                (target_batch,) = next(target_iter)

            class_output, coral = model(source_batch, target_batch)
            cls_loss = criterion(class_output, label_batch)
            total_loss = cls_loss + model.coral_weight * coral

            optimizer.zero_grad()
            total_loss.backward()
            optimizer.step()

            epoch_cls += cls_loss.item()
            epoch_coral += coral.item()
            n_batches += 1

        history["cls_loss"].append(epoch_cls / max(n_batches, 1))
        history["coral_loss"].append(epoch_coral / max(n_batches, 1))
        history["total_loss"].append(
            (epoch_cls + epoch_coral) / max(n_batches, 1)
        )

        if (epoch + 1) % 50 == 0:
            print(
                f"Epoch {epoch+1}/{epochs} | "
                f"Cls: {history['cls_loss'][-1]:.4f} | "
                f"CORAL: {history['coral_loss'][-1]:.6f}"
            )

    return history

Optimal Transport Adaptation

import ot  # Python Optimal Transport (POT) library


class OTDomainAdapter:
    """Domain adaptation using Optimal Transport (Wasserstein distance).

    Uses the Sinkhorn algorithm for efficient computation of
    the regularized optimal transport plan.
    """

    def __init__(self, reg: float = 0.1, method: str = "sinkhorn"):
        self.reg = reg
        self.method = method
        self.transport_plan = None

    def fit(
        self,
        source_features: np.ndarray,
        target_features: np.ndarray,
    ) -> "OTDomainAdapter":
        """Compute optimal transport plan between source and target."""
        n_s = source_features.shape[0]
        n_t = target_features.shape[0]

        # Uniform marginals
        mu_s = np.ones(n_s) / n_s
        mu_t = np.ones(n_t) / n_t

        # Cost matrix (squared Euclidean)
        cost_matrix = ot.dist(source_features, target_features, metric="sqeuclidean")

        # Compute transport plan via Sinkhorn
        self.transport_plan = ot.sinkhorn(
            mu_s, mu_t, cost_matrix, reg=self.reg
        )

        return self

    def transform(self, source_features: np.ndarray) -> np.ndarray:
        """Transport source features to target domain."""
        if self.transport_plan is None:
            raise ValueError("Must call fit() before transform()")

        # Barycentric mapping
        n_s = source_features.shape[0]
        transported = n_s * self.transport_plan @ source_features
        return transported

    def wasserstein_distance(
        self,
        source_features: np.ndarray,
        target_features: np.ndarray,
    ) -> float:
        """Compute the Wasserstein distance between distributions."""
        n_s = source_features.shape[0]
        n_t = target_features.shape[0]

        mu_s = np.ones(n_s) / n_s
        mu_t = np.ones(n_t) / n_t

        cost_matrix = ot.dist(source_features, target_features, metric="sqeuclidean")
        return ot.emd2(mu_s, mu_t, cost_matrix)

Complete Training Pipeline

def create_labels(returns: pd.Series, threshold: float = 0.0) -> np.ndarray:
    """Create ternary labels: 0=down, 1=neutral, 2=up."""
    labels = np.ones(len(returns), dtype=int)  # neutral
    labels[returns > threshold] = 2   # up
    labels[returns < -threshold] = 0  # down
    return labels


def run_domain_adaptation_pipeline(
    source_ticker: str = "SPY",
    target_symbol: str = "BTCUSDT",
    method: str = "dann",  # dann, mmd, coral
    epochs: int = 200,
    feature_dim: int = 64,
):
    """End-to-end domain adaptation pipeline: Stock → Crypto."""
    # 1. Load data
    print("Loading source (stock) data...")
    stock_data = load_stock_data([source_ticker])
    stock_df = pd.DataFrame({
        "open": stock_data["Open"][source_ticker],
        "high": stock_data["High"][source_ticker],
        "low": stock_data["Low"][source_ticker],
        "close": stock_data["Close"][source_ticker],
        "volume": stock_data["Volume"][source_ticker],
    }).dropna()

    print("Loading target (crypto) data from Bybit...")
    bybit_client = BybitDataClient()
    crypto_df = bybit_client.get_klines(target_symbol, interval="D", limit=1000)

    # 2. Feature engineering
    engineer = FeatureEngineer(lookback=20)
    source_features_df = engineer.compute_features(stock_df)
    target_features_df = engineer.compute_features(crypto_df)

    # 3. Normalize
    source_norm, target_norm = engineer.normalize_features(
        source_features_df, target_features_df
    )

    # 4. Create labels for source (next-day return direction)
    source_returns = stock_df["close"].pct_change().shift(-1)
    source_returns = source_returns.loc[source_norm.index]
    source_labels = create_labels(source_returns, threshold=0.001)

    source_X = source_norm.values
    target_X = target_norm.values
    input_dim = source_X.shape[1]

    # 5. Train domain adaptation model
    if method == "dann":
        model = DANN(input_dim, feature_dim=feature_dim, num_classes=3)
        trainer = DANNTrainer(model, epochs=epochs)
        history = trainer.train(source_X, source_labels, target_X)
    elif method == "mmd":
        model = MMDAdaptationModel(input_dim, feature_dim=feature_dim, num_classes=3)
        history = train_mmd_model(model, source_X, source_labels, target_X, epochs=epochs)
    elif method == "coral":
        model = DeepCORALModel(input_dim, feature_dim=feature_dim, num_classes=3)
        history = train_deep_coral(model, source_X, source_labels, target_X, epochs=epochs)
    else:
        raise ValueError(f"Unknown method: {method}")

    # 6. Generate predictions on target
    target_preds = model.predict(torch.FloatTensor(target_X))
    print(f"\nTarget predictions distribution: {np.bincount(target_preds.numpy(), minlength=3)}")

    return model, history, target_preds

Implementation in Rust

The Rust implementation provides high-performance domain adaptation suitable for production trading systems. The crate is organized into five modules: data, model, adaptation, trading, and backtest.

Crate Structure

92_domain_adaptation_finance/
├── Cargo.toml
├── src/
│   ├── lib.rs              # Module declarations, error types
│   ├── data/
│   │   ├── mod.rs           # Data module
│   │   ├── bybit.rs         # Bybit REST API client
│   │   └── features.rs      # Feature engineering
│   ├── model/
│   │   ├── mod.rs           # Model module
│   │   └── network.rs       # Neural network components
│   ├── adaptation/
│   │   ├── mod.rs           # Adaptation module
│   │   ├── dann.rs          # DANN trainer
│   │   ├── mmd.rs           # MMD adapter
│   │   └── coral.rs         # CORAL adapter
│   ├── trading/
│   │   ├── mod.rs           # Trading module
│   │   ├── signals.rs       # Signal generation
│   │   └── strategy.rs      # Strategy execution
│   └── backtest/
│       ├── mod.rs           # Backtest module
│       └── engine.rs        # Backtest engine
└── examples/
    ├── basic_adaptation.rs  # Simple DA example
    ├── cross_market.rs      # Stock → Crypto
    └── trading_strategy.rs  # Full strategy

Bybit Data Client

use reqwest;
use serde::{Deserialize, Serialize};
use chrono::{DateTime, Utc};

/// A single kline (candlestick) bar from Bybit.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Kline {
    pub timestamp: DateTime<Utc>,
    pub open: f64,
    pub high: f64,
    pub low: f64,
    pub close: f64,
    pub volume: f64,
    pub turnover: f64,
}

/// Response wrapper for Bybit API v5.
#[derive(Debug, Deserialize)]
struct BybitResponse {
    #[serde(rename = "retCode")]
    ret_code: i32,
    #[serde(rename = "retMsg")]
    ret_msg: String,
    result: BybitResult,
}

#[derive(Debug, Deserialize)]
struct BybitResult {
    list: Vec<Vec<String>>,
}

/// Asynchronous client for the Bybit REST API.
pub struct BybitClient {
    client: reqwest::Client,
    base_url: String,
}

impl BybitClient {
    pub fn new() -> Self {
        Self {
            client: reqwest::Client::new(),
            base_url: "https://api.bybit.com".to_string(),
        }
    }

    /// Fetch kline data for a given symbol and interval.
    pub async fn get_klines(
        &self,
        symbol: &str,
        interval: &str,
        limit: u32,
    ) -> Result<Vec<Kline>, Box<dyn std::error::Error>> {
        let url = format!("{}/v5/market/kline", self.base_url);
        let resp: BybitResponse = self
            .client
            .get(&url)
            .query(&[
                ("category", "linear"),
                ("symbol", symbol),
                ("interval", interval),
                ("limit", &limit.to_string()),
            ])
            .send()
            .await?
            .json()
            .await?;

        if resp.ret_code != 0 {
            return Err(format!("Bybit API error: {}", resp.ret_msg).into());
        }

        let mut klines: Vec<Kline> = resp
            .result
            .list
            .iter()
            .map(|row| {
                let ts_ms: i64 = row[0].parse().unwrap_or(0);
                Kline {
                    timestamp: DateTime::from_timestamp_millis(ts_ms)
                        .unwrap_or_default(),
                    open: row[1].parse().unwrap_or(0.0),
                    high: row[2].parse().unwrap_or(0.0),
                    low: row[3].parse().unwrap_or(0.0),
                    close: row[4].parse().unwrap_or(0.0),
                    volume: row[5].parse().unwrap_or(0.0),
                    turnover: row[6].parse().unwrap_or(0.0),
                }
            })
            .collect();

        klines.sort_by_key(|k| k.timestamp);
        Ok(klines)
    }
}

Feature Engineering

/// Feature vector computed from OHLCV data.
#[derive(Debug, Clone)]
pub struct FeatureVector {
    pub return_1d: f64,
    pub return_5d: f64,
    pub return_20d: f64,
    pub realized_vol: f64,
    pub vol_ratio: f64,
    pub rsi: f64,
    pub macd_norm: f64,
    pub bb_width: f64,
    pub volume_ratio: f64,
    pub mean_reversion: f64,
}

pub struct FeatureGenerator {
    lookback: usize,
}

impl FeatureGenerator {
    pub fn new(lookback: usize) -> Self {
        Self { lookback }
    }

    /// Compute features from a slice of kline data.
    pub fn compute(&self, klines: &[Kline]) -> Vec<FeatureVector> {
        let n = klines.len();
        if n < self.lookback + 26 {
            return vec![];
        }

        let closes: Vec<f64> = klines.iter().map(|k| k.close).collect();
        let highs: Vec<f64> = klines.iter().map(|k| k.high).collect();
        let lows: Vec<f64> = klines.iter().map(|k| k.low).collect();
        let volumes: Vec<f64> = klines.iter().map(|k| k.volume).collect();

        let returns: Vec<f64> = (1..n)
            .map(|i| (closes[i] - closes[i - 1]) / closes[i - 1])
            .collect();

        let start = self.lookback + 26;
        let mut features = Vec::with_capacity(n - start);

        for i in start..n {
            let ret_idx = i - 1; // index into returns array

            let return_1d = returns[ret_idx];
            let return_5d = if i >= 5 {
                (closes[i] - closes[i - 5]) / closes[i - 5]
            } else {
                0.0
            };
            let return_20d = (closes[i] - closes[i - self.lookback])
                / closes[i - self.lookback];

            // Realized volatility
            let vol_window: Vec<f64> = returns
                [ret_idx.saturating_sub(self.lookback - 1)..=ret_idx]
                .to_vec();
            let realized_vol = std_dev(&vol_window) * (252.0_f64).sqrt();

            // Volatility ratio (5-day / 20-day)
            let short_vol = std_dev(
                &returns[ret_idx.saturating_sub(4)..=ret_idx],
            );
            let vol_ratio = if realized_vol > 1e-10 {
                short_vol / (realized_vol / (252.0_f64).sqrt())
            } else {
                1.0
            };

            // RSI (14-period)
            let rsi = compute_rsi(&returns, ret_idx, 14);

            // Normalized MACD
            let ema12 = ema(&closes[..=i], 12);
            let ema26 = ema(&closes[..=i], 26);
            let macd_norm = (ema12 - ema26) / closes[i];

            // Bollinger Band width
            let sma = mean(&closes[i - self.lookback + 1..=i]);
            let sd = std_dev_slice(&closes[i - self.lookback + 1..=i]);
            let bb_width = if sma.abs() > 1e-10 {
                (2.0 * sd) / sma
            } else {
                0.0
            };

            // Volume ratio
            let avg_vol = mean(&volumes[i - self.lookback + 1..=i]);
            let volume_ratio = if avg_vol > 1e-10 {
                volumes[i] / avg_vol
            } else {
                1.0
            };

            // Mean reversion
            let mean_reversion = if sd > 1e-10 {
                (closes[i] - sma) / sd
            } else {
                0.0
            };

            features.push(FeatureVector {
                return_1d,
                return_5d,
                return_20d,
                realized_vol,
                vol_ratio,
                rsi,
                macd_norm,
                bb_width,
                volume_ratio,
                mean_reversion,
            });
        }

        features
    }
}

// --- Helper functions ---

fn mean(data: &[f64]) -> f64 {
    if data.is_empty() { return 0.0; }
    data.iter().sum::<f64>() / data.len() as f64
}

fn std_dev(data: &[f64]) -> f64 {
    std_dev_slice(data)
}

fn std_dev_slice(data: &[f64]) -> f64 {
    if data.len() < 2 { return 0.0; }
    let m = mean(data);
    let variance = data.iter().map(|x| (x - m).powi(2)).sum::<f64>()
        / (data.len() - 1) as f64;
    variance.sqrt()
}

fn ema(data: &[f64], period: usize) -> f64 {
    if data.is_empty() { return 0.0; }
    let alpha = 2.0 / (period as f64 + 1.0);
    let mut result = data[0];
    for &val in &data[1..] {
        result = alpha * val + (1.0 - alpha) * result;
    }
    result
}

fn compute_rsi(returns: &[f64], end_idx: usize, period: usize) -> f64 {
    let start = if end_idx >= period { end_idx - period + 1 } else { 0 };
    let window = &returns[start..=end_idx];
    let gains: f64 = window.iter().filter(|&&r| r > 0.0).sum::<f64>()
        / period as f64;
    let losses: f64 = window.iter().filter(|&&r| r < 0.0).map(|r| r.abs()).sum::<f64>()
        / period as f64;
    if losses < 1e-10 {
        return 100.0;
    }
    let rs = gains / losses;
    100.0 - (100.0 / (1.0 + rs))
}

Domain Adaptation Algorithms

use rand::Rng;
use rand_distr::{Distribution, Normal};

/// CORAL (CORrelation ALignment) adapter.
///
/// Aligns second-order statistics between source and target features.
pub struct CORALAdapter {
    /// Weight for the CORAL regularization term.
    pub weight: f64,
}

impl CORALAdapter {
    pub fn new(weight: f64) -> Self {
        Self { weight }
    }

    /// Compute CORAL loss between source and target feature matrices.
    ///
    /// L_coral = (1 / 4d^2) * ||C_S - C_T||_F^2
    pub fn coral_loss(
        &self,
        source: &[Vec<f64>],
        target: &[Vec<f64>],
    ) -> f64 {
        let d = source[0].len();
        let cov_s = covariance_matrix(source);
        let cov_t = covariance_matrix(target);

        let mut frobenius_sq = 0.0;
        for i in 0..d {
            for j in 0..d {
                let diff = cov_s[i][j] - cov_t[i][j];
                frobenius_sq += diff * diff;
            }
        }

        frobenius_sq / (4.0 * (d as f64) * (d as f64))
    }

    /// Whitening-coloring transform: align source to target statistics.
    pub fn transform(
        &self,
        source: &[Vec<f64>],
        target: &[Vec<f64>],
    ) -> Vec<Vec<f64>> {
        let d = source[0].len();
        let mean_s = column_means(source);
        let mean_t = column_means(target);
        let cov_s = covariance_matrix(source);
        let cov_t = covariance_matrix(target);

        // Whitening: remove source correlations
        let cov_s_inv_sqrt = matrix_sqrt_inv(&cov_s);
        // Coloring: apply target correlations
        let cov_t_sqrt = matrix_sqrt(&cov_t);

        let mut result = Vec::with_capacity(source.len());
        for sample in source {
            // Center
            let centered: Vec<f64> = (0..d)
                .map(|j| sample[j] - mean_s[j])
                .collect();

            // Whiten
            let whitened = mat_vec_mul(&cov_s_inv_sqrt, &centered);

            // Color and re-center
            let colored = mat_vec_mul(&cov_t_sqrt, &whitened);
            let aligned: Vec<f64> = (0..d)
                .map(|j| colored[j] + mean_t[j])
                .collect();

            result.push(aligned);
        }

        result
    }
}

/// MMD (Maximum Mean Discrepancy) adapter.
///
/// Uses Gaussian kernel to measure distribution distance.
pub struct MMDAdapter {
    /// Kernel bandwidth parameters.
    pub bandwidths: Vec<f64>,
}

impl MMDAdapter {
    pub fn new(bandwidths: Vec<f64>) -> Self {
        Self { bandwidths }
    }

    pub fn default_bandwidths() -> Self {
        Self {
            bandwidths: vec![0.1, 0.5, 1.0, 5.0, 10.0],
        }
    }

    /// Compute MMD^2 between source and target using multi-scale Gaussian kernel.
    pub fn mmd_squared(
        &self,
        source: &[Vec<f64>],
        target: &[Vec<f64>],
    ) -> f64 {
        let n_s = source.len() as f64;
        let n_t = target.len() as f64;

        let mut k_ss = 0.0;
        let mut k_tt = 0.0;
        let mut k_st = 0.0;

        // K(source, source)
        for i in 0..source.len() {
            for j in 0..source.len() {
                k_ss += self.multi_kernel(&source[i], &source[j]);
            }
        }

        // K(target, target)
        for i in 0..target.len() {
            for j in 0..target.len() {
                k_tt += self.multi_kernel(&target[i], &target[j]);
            }
        }

        // K(source, target)
        for i in 0..source.len() {
            for j in 0..target.len() {
                k_st += self.multi_kernel(&source[i], &target[j]);
            }
        }

        k_ss / (n_s * n_s) + k_tt / (n_t * n_t) - 2.0 * k_st / (n_s * n_t)
    }

    /// Multi-scale Gaussian kernel.
    fn multi_kernel(&self, x: &[f64], y: &[f64]) -> f64 {
        let sq_dist: f64 = x.iter()
            .zip(y.iter())
            .map(|(a, b)| (a - b).powi(2))
            .sum();

        self.bandwidths
            .iter()
            .map(|&bw| (-0.5 * sq_dist / bw).exp())
            .sum()
    }
}

/// DANN parameter optimizer using simple gradient steps.
pub struct DANNTrainer {
    pub learning_rate: f64,
    pub lambda_schedule_gamma: f64,
    pub epochs: usize,
}

impl DANNTrainer {
    pub fn new(learning_rate: f64, gamma: f64, epochs: usize) -> Self {
        Self {
            learning_rate,
            lambda_schedule_gamma: gamma,
            epochs,
        }
    }

    /// Compute the scheduled lambda value for adversarial training.
    pub fn schedule_lambda(&self, epoch: usize) -> f64 {
        let p = epoch as f64 / self.epochs as f64;
        2.0 / (1.0 + (-self.lambda_schedule_gamma * p).exp()) - 1.0
    }
}

// --- Linear algebra helpers ---

fn column_means(data: &[Vec<f64>]) -> Vec<f64> {
    let n = data.len() as f64;
    let d = data[0].len();
    let mut means = vec![0.0; d];
    for row in data {
        for j in 0..d {
            means[j] += row[j];
        }
    }
    for m in &mut means {
        *m /= n;
    }
    means
}

fn covariance_matrix(data: &[Vec<f64>]) -> Vec<Vec<f64>> {
    let n = data.len();
    let d = data[0].len();
    let means = column_means(data);

    let mut cov = vec![vec![0.0; d]; d];
    for row in data {
        for i in 0..d {
            for j in 0..d {
                cov[i][j] += (row[i] - means[i]) * (row[j] - means[j]);
            }
        }
    }
    let denom = (n - 1).max(1) as f64;
    for i in 0..d {
        for j in 0..d {
            cov[i][j] /= denom;
        }
    }
    cov
}

fn mat_vec_mul(mat: &[Vec<f64>], vec: &[f64]) -> Vec<f64> {
    mat.iter()
        .map(|row| row.iter().zip(vec.iter()).map(|(a, b)| a * b).sum())
        .collect()
}

fn matrix_sqrt(mat: &[Vec<f64>]) -> Vec<Vec<f64>> {
    // Simplified: returns identity + scaled matrix for near-identity case
    // In production, use eigendecomposition
    let d = mat.len();
    let mut result = vec![vec![0.0; d]; d];
    for i in 0..d {
        for j in 0..d {
            result[i][j] = if i == j {
                mat[i][j].abs().sqrt()
            } else {
                mat[i][j] * 0.5 / mat[i][i].abs().sqrt().max(1e-10)
            };
        }
    }
    result
}

fn matrix_sqrt_inv(mat: &[Vec<f64>]) -> Vec<Vec<f64>> {
    let d = mat.len();
    let mut result = vec![vec![0.0; d]; d];
    for i in 0..d {
        for j in 0..d {
            result[i][j] = if i == j {
                1.0 / mat[i][j].abs().sqrt().max(1e-10)
            } else {
                -mat[i][j] * 0.5
                    / (mat[i][i].abs().sqrt() * mat[j][j].abs().sqrt()).max(1e-10)
            };
        }
    }
    result
}

Trading Strategy

/// Trading signal generated by the domain-adapted model.
#[derive(Debug, Clone, PartialEq)]
pub enum TradingSignal {
    Long,
    Short,
    Neutral,
}

/// Position tracking for the strategy.
#[derive(Debug, Clone)]
pub struct Position {
    pub signal: TradingSignal,
    pub entry_price: f64,
    pub size: f64,
    pub timestamp: DateTime<Utc>,
}

/// Adaptive trading strategy using domain-adapted predictions.
pub struct AdaptiveStrategy {
    pub risk_per_trade: f64,
    pub stop_loss_pct: f64,
    pub take_profit_pct: f64,
    pub max_positions: usize,
    positions: Vec<Position>,
}

impl AdaptiveStrategy {
    pub fn new(
        risk_per_trade: f64,
        stop_loss_pct: f64,
        take_profit_pct: f64,
        max_positions: usize,
    ) -> Self {
        Self {
            risk_per_trade,
            stop_loss_pct,
            take_profit_pct,
            max_positions,
            positions: Vec::new(),
        }
    }

    /// Generate a trading signal from model predictions.
    ///
    /// prediction: 0=down, 1=neutral, 2=up
    pub fn generate_signal(
        &self,
        prediction: usize,
        confidence: f64,
        min_confidence: f64,
    ) -> TradingSignal {
        if confidence < min_confidence {
            return TradingSignal::Neutral;
        }
        match prediction {
            0 => TradingSignal::Short,
            2 => TradingSignal::Long,
            _ => TradingSignal::Neutral,
        }
    }

    /// Compute position size based on volatility-adjusted risk.
    pub fn position_size(
        &self,
        capital: f64,
        price: f64,
        volatility: f64,
    ) -> f64 {
        let risk_amount = capital * self.risk_per_trade;
        let dollar_risk = price * volatility * self.stop_loss_pct;
        if dollar_risk > 1e-10 {
            (risk_amount / dollar_risk).min(capital / price)
        } else {
            0.0
        }
    }
}

Backtest Engine

/// Configuration for backtesting.
#[derive(Debug, Clone)]
pub struct BacktestConfig {
    pub initial_capital: f64,
    pub commission_rate: f64,
    pub slippage_bps: f64,
}

/// Results from a backtest run.
#[derive(Debug, Clone)]
pub struct BacktestResults {
    pub total_return: f64,
    pub sharpe_ratio: f64,
    pub sortino_ratio: f64,
    pub max_drawdown: f64,
    pub win_rate: f64,
    pub num_trades: usize,
    pub equity_curve: Vec<f64>,
    pub trades: Vec<Trade>,
}

/// A completed trade.
#[derive(Debug, Clone)]
pub struct Trade {
    pub entry_price: f64,
    pub exit_price: f64,
    pub pnl: f64,
    pub pnl_pct: f64,
    pub direction: TradingSignal,
    pub holding_period: usize,
}

/// Backtest engine that simulates strategy execution.
pub struct BacktestEngine {
    config: BacktestConfig,
}

impl BacktestEngine {
    pub fn new(config: BacktestConfig) -> Self {
        Self { config }
    }

    /// Run backtest on historical data with given signals.
    pub fn run(
        &self,
        prices: &[f64],
        signals: &[TradingSignal],
    ) -> BacktestResults {
        let mut capital = self.config.initial_capital;
        let mut equity_curve = vec![capital];
        let mut trades: Vec<Trade> = Vec::new();
        let mut position: Option<(f64, TradingSignal, usize)> = None; // (entry, dir, bar)

        for i in 0..signals.len().min(prices.len() - 1) {
            let price = prices[i];

            // Check for exit
            if let Some((entry, ref dir, entry_bar)) = position {
                let pnl_pct = match dir {
                    TradingSignal::Long => (price - entry) / entry,
                    TradingSignal::Short => (entry - price) / entry,
                    _ => 0.0,
                };

                let should_exit = signals[i] == TradingSignal::Neutral
                    || signals[i] != *dir
                    || pnl_pct <= -0.02  // stop loss
                    || pnl_pct >= 0.05;  // take profit

                if should_exit {
                    let commission = price * self.config.commission_rate * 2.0;
                    let slippage = price * self.config.slippage_bps * 1e-4 * 2.0;
                    let net_pnl_pct = pnl_pct - (commission + slippage) / entry;

                    capital *= 1.0 + net_pnl_pct;
                    trades.push(Trade {
                        entry_price: entry,
                        exit_price: price,
                        pnl: capital * net_pnl_pct,
                        pnl_pct: net_pnl_pct,
                        direction: dir.clone(),
                        holding_period: i - entry_bar,
                    });
                    position = None;
                }
            }

            // Check for entry
            if position.is_none()
                && (signals[i] == TradingSignal::Long
                    || signals[i] == TradingSignal::Short)
            {
                position = Some((price, signals[i].clone(), i));
            }

            equity_curve.push(capital);
        }

        // Compute metrics
        let returns: Vec<f64> = (1..equity_curve.len())
            .map(|i| equity_curve[i] / equity_curve[i - 1] - 1.0)
            .collect();

        let total_return =
            (equity_curve.last().unwrap_or(&capital) / self.config.initial_capital) - 1.0;
        let sharpe_ratio = compute_sharpe(&returns);
        let sortino_ratio = compute_sortino(&returns);
        let max_drawdown = compute_max_drawdown(&equity_curve);
        let winning = trades.iter().filter(|t| t.pnl > 0.0).count();
        let win_rate = if trades.is_empty() {
            0.0
        } else {
            winning as f64 / trades.len() as f64
        };

        BacktestResults {
            total_return,
            sharpe_ratio,
            sortino_ratio,
            max_drawdown,
            win_rate,
            num_trades: trades.len(),
            equity_curve,
            trades,
        }
    }
}

fn compute_sharpe(returns: &[f64]) -> f64 {
    if returns.is_empty() { return 0.0; }
    let mean_r = returns.iter().sum::<f64>() / returns.len() as f64;
    let var: f64 = returns.iter().map(|r| (r - mean_r).powi(2)).sum::<f64>()
        / (returns.len() - 1).max(1) as f64;
    let std = var.sqrt();
    if std < 1e-10 { return 0.0; }
    (mean_r / std) * (252.0_f64).sqrt()
}

fn compute_sortino(returns: &[f64]) -> f64 {
    if returns.is_empty() { return 0.0; }
    let mean_r = returns.iter().sum::<f64>() / returns.len() as f64;
    let downside: Vec<f64> = returns.iter().filter(|&&r| r < 0.0).copied().collect();
    if downside.is_empty() { return f64::INFINITY; }
    let downside_var = downside.iter().map(|r| r.powi(2)).sum::<f64>()
        / downside.len() as f64;
    let downside_std = downside_var.sqrt();
    if downside_std < 1e-10 { return 0.0; }
    (mean_r / downside_std) * (252.0_f64).sqrt()
}

fn compute_max_drawdown(equity: &[f64]) -> f64 {
    let mut peak = equity[0];
    let mut max_dd = 0.0;
    for &val in equity {
        if val > peak { peak = val; }
        let dd = (peak - val) / peak;
        if dd > max_dd { max_dd = dd; }
    }
    max_dd
}

Practical Examples with Stock and Crypto Data

Example 1: SPY to BTCUSDT Adaptation (DANN)

This example demonstrates adapting a return-direction classifier trained on SPY data to predict BTC movements on Bybit.

import matplotlib.pyplot as plt

def example_spy_to_btc():
    """Full example: Adapt SPY model to predict BTCUSDT on Bybit."""
    # Step 1: Load data
    stock_data = load_stock_data(["SPY"], start="2019-01-01", end="2024-01-01")
    stock_df = pd.DataFrame({
        "open": stock_data["Open"]["SPY"],
        "high": stock_data["High"]["SPY"],
        "low": stock_data["Low"]["SPY"],
        "close": stock_data["Close"]["SPY"],
        "volume": stock_data["Volume"]["SPY"],
    }).dropna()

    bybit = BybitDataClient()
    btc_df = bybit.get_klines("BTCUSDT", interval="D", limit=1000)

    # Step 2: Engineer features
    engineer = FeatureEngineer(lookback=20)
    source_feat = engineer.compute_features(stock_df)
    target_feat = engineer.compute_features(btc_df)
    source_norm, target_norm = engineer.normalize_features(source_feat, target_feat)

    # Step 3: Create source labels (next-day return direction)
    source_returns = stock_df["close"].pct_change().shift(-1).loc[source_norm.index]
    labels = create_labels(source_returns, threshold=0.001)

    # Step 4: Split source into train/val
    split_idx = int(len(source_norm) * 0.8)
    train_X, val_X = source_norm.values[:split_idx], source_norm.values[split_idx:]
    train_y, val_y = labels[:split_idx], labels[split_idx:]

    input_dim = train_X.shape[1]

    # Step 5: Train baseline (no adaptation)
    baseline_model = DANN(input_dim, num_classes=3, lambda_val=0.0)
    baseline_trainer = DANNTrainer(baseline_model, epochs=100)
    baseline_trainer.train(train_X, train_y, target_norm.values)

    # Step 6: Train DANN (with adaptation)
    dann_model = DANN(input_dim, num_classes=3, lambda_val=1.0)
    dann_trainer = DANNTrainer(dann_model, epochs=200)
    history = dann_trainer.train(train_X, train_y, target_norm.values)

    # Step 7: Compare predictions
    target_tensor = torch.FloatTensor(target_norm.values)

    baseline_preds = baseline_model.predict(target_tensor).numpy()
    dann_preds = dann_model.predict(target_tensor).numpy()

    # Step 8: Visualize
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))

    axes[0, 0].plot(history["label_loss"], label="Label Loss")
    axes[0, 0].plot(history["domain_loss"], label="Domain Loss")
    axes[0, 0].set_title("DANN Training Losses")
    axes[0, 0].legend()

    axes[0, 1].hist(baseline_preds, bins=3, alpha=0.5, label="Baseline")
    axes[0, 1].hist(dann_preds, bins=3, alpha=0.5, label="DANN")
    axes[0, 1].set_title("Prediction Distribution on BTCUSDT")
    axes[0, 1].legend()

    # Feature space visualization (t-SNE)
    from sklearn.manifold import TSNE
    source_features = dann_model.feature_extractor(
        torch.FloatTensor(train_X)
    ).detach().numpy()
    target_features = dann_model.feature_extractor(
        target_tensor
    ).detach().numpy()

    combined = np.vstack([source_features[:200], target_features[:200]])
    tsne = TSNE(n_components=2, random_state=42)
    embedded = tsne.fit_transform(combined)

    axes[1, 0].scatter(
        embedded[:200, 0], embedded[:200, 1],
        c="blue", alpha=0.5, label="Source (SPY)", s=10
    )
    axes[1, 0].scatter(
        embedded[200:, 0], embedded[200:, 1],
        c="red", alpha=0.5, label="Target (BTCUSDT)", s=10
    )
    axes[1, 0].set_title("Feature Space (t-SNE) After DANN")
    axes[1, 0].legend()

    plt.tight_layout()
    plt.savefig("dann_spy_to_btc.png", dpi=150)
    plt.show()

    return dann_model, history

Example 2: Multi-Asset Cross-Domain Adaptation

def example_multi_asset_adaptation():
    """Adapt a model across multiple crypto pairs on Bybit."""
    bybit = BybitDataClient()

    # Source: BTCUSDT (most data, most liquid)
    # Targets: ETHUSDT, SOLUSDT, AVAXUSDT
    source_symbol = "BTCUSDT"
    target_symbols = ["ETHUSDT", "SOLUSDT", "AVAXUSDT"]

    source_df = bybit.get_klines(source_symbol, interval="D", limit=1000)
    target_data = bybit.get_multiple_symbols(target_symbols, interval="D", limit=1000)

    engineer = FeatureEngineer(lookback=20)
    source_feat = engineer.compute_features(source_df)

    # Create source labels
    source_returns = source_df["close"].pct_change().shift(-1)
    source_returns = source_returns.iloc[source_feat.index[0]:source_feat.index[-1] + 1]
    source_labels = create_labels(source_returns.loc[source_feat.index], threshold=0.005)

    results = {}

    for method_name, method_class in [
        ("DANN", DANN),
        ("MMD", MMDAdaptationModel),
        ("CORAL", DeepCORALModel),
    ]:
        for sym in target_symbols:
            target_df = target_data[sym]
            target_feat = engineer.compute_features(target_df)

            source_norm, target_norm = engineer.normalize_features(
                source_feat, target_feat
            )
            input_dim = source_norm.shape[1]

            if method_name == "DANN":
                model = DANN(input_dim, num_classes=3)
                trainer = DANNTrainer(model, epochs=150)
                trainer.train(source_norm.values, source_labels, target_norm.values)
            elif method_name == "MMD":
                model = MMDAdaptationModel(input_dim, num_classes=3)
                train_mmd_model(
                    model, source_norm.values, source_labels,
                    target_norm.values, epochs=150,
                )
            else:
                model = DeepCORALModel(input_dim, num_classes=3)
                train_deep_coral(
                    model, source_norm.values, source_labels,
                    target_norm.values, epochs=150,
                )

            preds = model.predict(torch.FloatTensor(target_norm.values)).numpy()
            results[(method_name, sym)] = {
                "predictions": preds,
                "distribution": np.bincount(preds, minlength=3),
            }

    # Print comparison table
    print("\n" + "=" * 70)
    print(f"{'Method':<10} {'Target':<12} {'Down%':>8} {'Neutral%':>10} {'Up%':>8}")
    print("=" * 70)
    for (method, sym), res in sorted(results.items()):
        dist = res["distribution"] / res["distribution"].sum() * 100
        print(f"{method:<10} {sym:<12} {dist[0]:>7.1f}% {dist[1]:>9.1f}% {dist[2]:>7.1f}%")

    return results

Example 3: Measuring Domain Distance

def measure_domain_distances():
    """Compute domain distances between different markets using MMD and Wasserstein."""
    bybit = BybitDataClient()
    engineer = FeatureEngineer(lookback=20)

    symbols = ["BTCUSDT", "ETHUSDT", "SOLUSDT", "AVAXUSDT"]
    features = {}

    for sym in symbols:
        df = bybit.get_klines(sym, interval="D", limit=1000)
        feat = engineer.compute_features(df)
        features[sym] = feat.values

    # Compute pairwise MMD
    mmd = MMDLoss()
    ot_adapter = OTDomainAdapter(reg=0.1)

    print("\nPairwise MMD^2 distances:")
    print(f"{'':>12}", end="")
    for sym in symbols:
        print(f"{sym:>12}", end="")
    print()

    for sym_i in symbols:
        print(f"{sym_i:>12}", end="")
        for sym_j in symbols:
            source_t = torch.FloatTensor(features[sym_i][:200])
            target_t = torch.FloatTensor(features[sym_j][:200])
            dist = mmd(source_t, target_t).item()
            print(f"{dist:>12.4f}", end="")
        print()

    # Compute pairwise Wasserstein distances
    print("\nPairwise Wasserstein distances:")
    print(f"{'':>12}", end="")
    for sym in symbols:
        print(f"{sym:>12}", end="")
    print()

    for sym_i in symbols:
        print(f"{sym_i:>12}", end="")
        for sym_j in symbols:
            dist = ot_adapter.wasserstein_distance(
                features[sym_i][:200], features[sym_j][:200]
            )
            print(f"{dist:>12.4f}", end="")
        print()

Backtesting Framework

Backtesting Domain-Adapted Strategies

The backtesting framework evaluates domain-adapted strategies against baselines to measure the value of adaptation. The key principle is to ensure no look-ahead bias: the adaptation is performed using only data available up to the prediction point.

class DomainAdaptationBacktester:
    """Backtest framework for domain-adapted trading strategies."""

    def __init__(
        self,
        initial_capital: float = 100_000.0,
        commission_bps: float = 10.0,
        slippage_bps: float = 5.0,
    ):
        self.initial_capital = initial_capital
        self.commission_rate = commission_bps / 10_000
        self.slippage_rate = slippage_bps / 10_000

    def signal_to_position(self, prediction: int, confidence: float = 1.0) -> float:
        """Convert model prediction to position size (-1, 0, +1)."""
        if prediction == 2:  # Up
            return 1.0 * min(confidence, 1.0)
        elif prediction == 0:  # Down
            return -1.0 * min(confidence, 1.0)
        return 0.0

    def run_backtest(
        self,
        prices: np.ndarray,
        predictions: np.ndarray,
        strategy_name: str = "Strategy",
    ) -> dict:
        """Run a simple long/short backtest based on predictions."""
        n = min(len(prices), len(predictions))
        returns = np.diff(prices[:n]) / prices[:n - 1]

        positions = np.array([self.signal_to_position(p) for p in predictions[:n - 1]])
        strategy_returns = positions * returns

        # Transaction costs
        position_changes = np.abs(np.diff(np.concatenate([[0], positions])))
        costs = position_changes * (self.commission_rate + self.slippage_rate)
        net_returns = strategy_returns - costs

        # Equity curve
        equity = self.initial_capital * np.cumprod(1 + net_returns)
        equity = np.concatenate([[self.initial_capital], equity])

        # Metrics
        total_return = equity[-1] / self.initial_capital - 1
        ann_return = (1 + total_return) ** (252 / max(len(net_returns), 1)) - 1
        ann_vol = np.std(net_returns) * np.sqrt(252)
        sharpe = ann_return / ann_vol if ann_vol > 0 else 0
        downside_returns = net_returns[net_returns < 0]
        downside_vol = np.std(downside_returns) * np.sqrt(252) if len(downside_returns) > 0 else 1e-10
        sortino = ann_return / downside_vol

        peak = np.maximum.accumulate(equity)
        drawdown = (peak - equity) / peak
        max_drawdown = np.max(drawdown)

        winning_days = np.sum(net_returns > 0)
        total_days = np.sum(net_returns != 0)
        win_rate = winning_days / total_days if total_days > 0 else 0

        return {
            "strategy": strategy_name,
            "total_return": total_return,
            "annual_return": ann_return,
            "annual_volatility": ann_vol,
            "sharpe_ratio": sharpe,
            "sortino_ratio": sortino,
            "max_drawdown": max_drawdown,
            "win_rate": win_rate,
            "num_trades": int(np.sum(position_changes > 0)),
            "equity_curve": equity,
        }

    def compare_strategies(
        self,
        prices: np.ndarray,
        strategies: dict,
    ) -> pd.DataFrame:
        """Compare multiple strategies (adapted vs non-adapted)."""
        results = []
        for name, predictions in strategies.items():
            result = self.run_backtest(prices, predictions, name)
            results.append(result)

        df = pd.DataFrame(results).set_index("strategy")
        display_cols = [
            "total_return", "annual_return", "sharpe_ratio",
            "sortino_ratio", "max_drawdown", "win_rate", "num_trades"
        ]
        return df[display_cols]


def run_full_comparison():
    """Compare adapted vs non-adapted strategies on Bybit BTCUSDT."""
    # Load data
    stock_data = load_stock_data(["SPY"], start="2020-01-01", end="2024-01-01")
    stock_df = pd.DataFrame({
        "open": stock_data["Open"]["SPY"],
        "high": stock_data["High"]["SPY"],
        "low": stock_data["Low"]["SPY"],
        "close": stock_data["Close"]["SPY"],
        "volume": stock_data["Volume"]["SPY"],
    }).dropna()

    bybit = BybitDataClient()
    btc_df = bybit.get_klines("BTCUSDT", interval="D", limit=1000)

    engineer = FeatureEngineer(lookback=20)
    source_feat = engineer.compute_features(stock_df)
    target_feat = engineer.compute_features(btc_df)
    source_norm, target_norm = engineer.normalize_features(source_feat, target_feat)

    source_returns = stock_df["close"].pct_change().shift(-1).loc[source_norm.index]
    source_labels = create_labels(source_returns, threshold=0.001)

    input_dim = source_norm.shape[1]
    target_X = target_norm.values
    target_tensor = torch.FloatTensor(target_X)

    # Train models
    # 1. No adaptation (baseline)
    baseline = DANN(input_dim, num_classes=3, lambda_val=0.0)
    DANNTrainer(baseline, epochs=100).train(
        source_norm.values, source_labels, target_X
    )

    # 2. DANN
    dann = DANN(input_dim, num_classes=3, lambda_val=1.0)
    DANNTrainer(dann, epochs=200).train(
        source_norm.values, source_labels, target_X
    )

    # 3. MMD
    mmd_model = MMDAdaptationModel(input_dim, num_classes=3, mmd_weight=1.0)
    train_mmd_model(mmd_model, source_norm.values, source_labels, target_X, epochs=200)

    # 4. Deep CORAL
    coral_model = DeepCORALModel(input_dim, num_classes=3, coral_weight=1.0)
    train_deep_coral(coral_model, source_norm.values, source_labels, target_X, epochs=200)

    # Generate predictions
    strategies = {
        "No Adaptation": baseline.predict(target_tensor).numpy(),
        "DANN": dann.predict(target_tensor).numpy(),
        "MMD": mmd_model.predict(target_tensor).numpy(),
        "Deep CORAL": coral_model.predict(target_tensor).numpy(),
    }

    # Backtest
    bt = DomainAdaptationBacktester(initial_capital=100_000)
    prices = btc_df["close"].values[-len(target_X):]
    comparison = bt.compare_strategies(prices, strategies)

    print("\n" + "=" * 90)
    print("STRATEGY COMPARISON: SPY → BTCUSDT Domain Adaptation")
    print("=" * 90)
    print(comparison.to_string(float_format=lambda x: f"{x:.4f}"))

    return comparison

Walk-Forward Backtesting with Rolling Adaptation

def walk_forward_backtest(
    source_features: np.ndarray,
    source_labels: np.ndarray,
    target_features: np.ndarray,
    target_prices: np.ndarray,
    adaptation_window: int = 252,
    step_size: int = 21,
    method: str = "dann",
) -> dict:
    """Walk-forward backtest with periodic re-adaptation.

    Re-adapts the model every `step_size` bars using a rolling
    window of `adaptation_window` bars.
    """
    n_target = len(target_features)
    all_predictions = np.full(n_target, 1)  # default neutral
    input_dim = source_features.shape[1]

    for start in range(0, n_target - step_size, step_size):
        end = min(start + adaptation_window, n_target)
        target_window = target_features[start:end]

        if method == "dann":
            model = DANN(input_dim, num_classes=3)
            trainer = DANNTrainer(model, epochs=100, batch_size=32)
            trainer.train(source_features, source_labels, target_window)
        elif method == "mmd":
            model = MMDAdaptationModel(input_dim, num_classes=3)
            train_mmd_model(
                model, source_features, source_labels,
                target_window, epochs=100, batch_size=32,
            )
        elif method == "coral":
            model = DeepCORALModel(input_dim, num_classes=3)
            train_deep_coral(
                model, source_features, source_labels,
                target_window, epochs=100, batch_size=32,
            )

        # Predict on the next step_size bars
        pred_start = end
        pred_end = min(end + step_size, n_target)
        if pred_start >= n_target:
            break

        pred_features = torch.FloatTensor(target_features[pred_start:pred_end])
        predictions = model.predict(pred_features).numpy()
        all_predictions[pred_start:pred_end] = predictions

    # Backtest
    bt = DomainAdaptationBacktester()
    result = bt.run_backtest(
        target_prices, all_predictions,
        f"Walk-Forward {method.upper()}"
    )

    return result

Performance Evaluation

Evaluation Metrics

Domain-adapted trading strategies are evaluated on both prediction quality and financial performance:

Category	Metric	Formula	Interpretation
Prediction	Accuracy	$\frac{\text{correct}}{N}$	Overall correctness
Prediction	F1-Score	$2 \cdot \frac{P \cdot R}{P + R}$	Balance of precision/recall
Prediction	AUC-ROC	Area under ROC curve	Ranking quality
Financial	Sharpe Ratio	$\frac{\mu_r - r_f}{\sigma_r} \cdot \sqrt{252}$	Risk-adjusted return
Financial	Sortino Ratio	$\frac{\mu_r - r_f}{\sigma_d} \cdot \sqrt{252}$	Downside risk-adjusted return
Financial	Max Drawdown	$\max_t \frac{\text{peak}_t - \text{value}_t}{\text{peak}_t}$	Worst peak-to-trough decline
Financial	Calmar Ratio	$\frac{\text{Ann. Return}}{\text{Max Drawdown}}$	Return per unit of drawdown risk
Adaptation	MMD Distance	$\widehat{\text{MMD}}^2(P_S, P_T)$	Feature distribution alignment
Adaptation	A-distance	$2(1 - 2\epsilon_d)$	Domain classifier error
Adaptation	CORAL Distance	$\frac{1}{4d^2}\|C_S - C_T\|_F^2$	Covariance alignment

Domain Adaptation Quality Assessment

def evaluate_adaptation_quality(
    model: nn.Module,
    source_features: np.ndarray,
    target_features: np.ndarray,
) -> dict:
    """Evaluate how well the adaptation aligned the domains."""
    model.eval()
    with torch.no_grad():
        source_repr = model.feature_extractor(
            torch.FloatTensor(source_features)
        ).numpy()
        target_repr = model.feature_extractor(
            torch.FloatTensor(target_features)
        ).numpy()

    # 1. MMD distance in feature space
    mmd = MMDLoss()
    mmd_dist = mmd(
        torch.FloatTensor(source_repr[:500]),
        torch.FloatTensor(target_repr[:500]),
    ).item()

    # 2. Proxy A-distance (train domain classifier on features)
    from sklearn.svm import SVC
    from sklearn.model_selection import cross_val_score

    n_s = min(len(source_repr), 500)
    n_t = min(len(target_repr), 500)
    X = np.vstack([source_repr[:n_s], target_repr[:n_t]])
    y = np.concatenate([np.zeros(n_s), np.ones(n_t)])

    svm = SVC(kernel="linear", max_iter=1000)
    cv_score = cross_val_score(svm, X, y, cv=5, scoring="accuracy").mean()
    a_distance = 2 * (1 - 2 * (cv_score - 0.5))

    # 3. CORAL distance
    coral = CORALLoss()
    coral_dist = coral(
        torch.FloatTensor(source_repr[:500]),
        torch.FloatTensor(target_repr[:500]),
    ).item()

    return {
        "mmd_distance": mmd_dist,
        "proxy_a_distance": a_distance,
        "coral_distance": coral_dist,
        "domain_classifier_accuracy": cv_score,
    }

Expected Results Comparison

The following table shows typical results when adapting an equity model to cryptocurrency trading. Results vary based on market conditions and time period.

Strategy	Annual Return	Sharpe Ratio	Sortino Ratio	Max Drawdown	Win Rate
No Adaptation (baseline)	5-12%	0.3-0.6	0.4-0.8	25-45%	48-52%
DANN	12-25%	0.7-1.3	1.0-1.8	15-30%	51-55%
MMD	10-22%	0.6-1.2	0.9-1.6	18-32%	50-54%
Deep CORAL	11-23%	0.7-1.2	0.9-1.7	16-31%	51-55%
OT Adaptation	13-26%	0.8-1.4	1.1-1.9	14-28%	52-56%
Buy & Hold (BTC)	20-80%*	0.5-1.0	0.6-1.1	30-75%	N/A

Note: Buy & hold returns vary dramatically depending on the evaluation period. The value of domain adaptation is most evident in risk-adjusted metrics (Sharpe, Sortino) and drawdown control rather than raw returns.

Ablation Study Components

def run_ablation_study(
    source_features: np.ndarray,
    source_labels: np.ndarray,
    target_features: np.ndarray,
    target_prices: np.ndarray,
):
    """Ablation study: impact of each adaptation component."""
    input_dim = source_features.shape[1]
    bt = DomainAdaptationBacktester()
    results = []

    # Vary adaptation strength
    for weight in [0.0, 0.1, 0.5, 1.0, 2.0, 5.0]:
        model = DeepCORALModel(input_dim, num_classes=3, coral_weight=weight)
        train_deep_coral(model, source_features, source_labels, target_features)
        preds = model.predict(torch.FloatTensor(target_features)).numpy()
        result = bt.run_backtest(target_prices, preds, f"CORAL_w={weight}")
        result["weight"] = weight
        results.append(result)

    # Vary feature dimensions
    for dim in [16, 32, 64, 128, 256]:
        model = DANN(input_dim, feature_dim=dim, num_classes=3)
        trainer = DANNTrainer(model, epochs=200)
        trainer.train(source_features, source_labels, target_features)
        preds = model.predict(torch.FloatTensor(target_features)).numpy()
        result = bt.run_backtest(target_prices, preds, f"DANN_dim={dim}")
        result["feature_dim"] = dim
        results.append(result)

    return pd.DataFrame(results)

Future Directions

Emerging Research Areas

Continuous Domain Adaptation: Markets shift continuously, not discretely. Methods that adapt online as the target distribution evolves (e.g., combining online learning from Chapter 34 with domain adaptation) are an active area of research.
Multi-Source Domain Adaptation: Combining knowledge from multiple source domains (e.g., equities, commodities, and FX simultaneously) to adapt to crypto markets. Theoretical bounds by Mansour et al. (2009) provide the foundation.
Partial Domain Adaptation: When the target domain contains only a subset of the source classes. In trading, this arises when certain market regimes present in the source are absent in the target.
Open Set Domain Adaptation: Handling target samples from unknown classes not present in the source. This is relevant when new market microstructure patterns emerge in crypto that have no equity analog.
Domain Generalization: Learning representations that generalize to any unseen domain without target data. This would allow building a single model that works across all markets without any adaptation step.
Foundation Models for Finance: Large pre-trained models on diverse financial data that can be adapted to specific tasks via prompting or lightweight fine-tuning, analogous to GPT-style models for NLP.
Causal Domain Adaptation: Using causal reasoning (see Chapters 96-110) to identify invariant mechanisms across domains rather than just statistical correlations.

Practical Considerations

When NOT to adapt: If the source and target domains are too different (high $\lambda^*$), adaptation may hurt. Always compare against a no-adaptation baseline.
Negative transfer: Monitor for degradation. If the adapted model performs worse than the baseline, the domains may be fundamentally incompatible.
Computational cost: DANN requires adversarial training; CORAL is much cheaper. Choose based on compute budget and latency requirements.
Regulatory considerations: Ensure that cross-market adaptation does not violate regulatory boundaries (e.g., using insider-adjacent information from one market to trade another).

References

Ben-David, S., Blitzer, J., Crammer, K., Kuber, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine Learning, 79(1-2), 151-175.
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., … & Lempitsky, V. (2016). Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(59), 1-35.
Gretton, A., Borgwardt, K. M., Rasch, M. J., Scholkopf, B., & Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13, 723-773.
Sun, B., & Saenko, K. (2016). Deep CORAL: Correlation alignment for deep domain adaptation. ECCV Workshops.
Courty, N., Flamary, R., Tuia, D., & Rakotomamonjy, A. (2017). Optimal transport for domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9), 1853-1865.
Long, M., Cao, Y., Wang, J., & Jordan, M. (2015). Learning transferable features with deep adaptation networks. ICML.
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., & Darrell, T. (2014). Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474.
Mansour, Y., Mohri, M., & Rostamizadeh, A. (2009). Domain adaptation: Learning bounds and algorithms. COLT.
Liu, Q., Xue, H., & Zhao, X. (2022). Domain adaptation for time series forecasting via attention sharing. arXiv preprint arXiv:2203.12501.
Peyre, G., & Cuturi, M. (2019). Computational optimal transport. Foundations and Trends in Machine Learning, 11(5-6), 355-607.

Chapter 92: Domain Adaptation for Finance

Chapter 92: Domain Adaptation for Finance

Overview

Table of Contents

Introduction to Domain Adaptation

What is Domain Adaptation?

The Domain Shift Problem in Finance

Why Domain Adaptation for Trading?

Mathematical Foundation

Domain Adaptation Theory

Ben-David’s Bound on Target Error

Maximum Mean Discrepancy (MMD)

Domain-Adversarial Neural Networks (DANN)

Correlation Alignment (CORAL)

Optimal Transport for Domain Adaptation

Domain Adaptation vs Other Transfer Methods

Applications in Trading

1. Cross-Market Adaptation (Stocks to Crypto)

2. Temporal Domain Adaptation (Regime Changes)

3. Cross-Exchange Adaptation

4. Cross-Asset Class Adaptation

Implementation in Python

Project Setup

Data Loading: Bybit and Stock Data

Feature Engineering

DANN Implementation (PyTorch)

MMD Implementation

CORAL Implementation

Optimal Transport Adaptation

Complete Training Pipeline

Implementation in Rust

Crate Structure

Bybit Data Client

Feature Engineering

Domain Adaptation Algorithms

Trading Strategy

Backtest Engine

Practical Examples with Stock and Crypto Data

Example 1: SPY to BTCUSDT Adaptation (DANN)

Example 2: Multi-Asset Cross-Domain Adaptation

Example 3: Measuring Domain Distance

Backtesting Framework

Backtesting Domain-Adapted Strategies

Walk-Forward Backtesting with Rolling Adaptation

Performance Evaluation

Evaluation Metrics

Domain Adaptation Quality Assessment

Expected Results Comparison

Ablation Study Components

Future Directions

Emerging Research Areas

Practical Considerations

References