Chapter 87: Task-Agnostic Trading
Chapter 87: Task-Agnostic Trading
Overview
Task-Agnostic Trading addresses a fundamental limitation in ML-driven trading systems: traditional approaches require separate models for each trading objective — one for trend prediction, another for volatility forecasting, yet another for regime detection, and so on. Each model learns its own features from scratch, leading to redundant computation, fragmented insights, and poor generalization.
Task-agnostic representation learning solves this by training a single universal encoder that maps raw market data into a shared representation space useful across all downstream trading tasks simultaneously. Lightweight task-specific heads then decode these representations for each objective, while gradient harmonization ensures balanced multi-task learning.
Table of Contents
- Introduction
- Theoretical Foundation
- Architecture Design
- Multi-Task Learning for Trading
- Gradient Harmonization
- Decision Fusion
- Implementation Strategy
- Bybit Integration
- Backtesting Framework
- Performance Metrics
- References
Introduction
In quantitative trading, a model trained for trend prediction learns features like momentum indicators and moving average crossovers. A volatility model learns features like realized variance and ATR. A regime model learns features like autocorrelation decay and distribution shape. But these features overlap significantly — they all describe the same underlying market dynamics from different perspectives.
The Problem with Task-Specific Models
┌──────────────────────────────────────────────────────────────────────────────┐│ Traditional Approach: Separate Models │├──────────────────────────────────────────────────────────────────────────────┤│ ││ Raw Market Data ││ │ ││ ├──────────────────┐ ││ │ │ ││ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ││ │ Trend │ │ Vol │ │ Regime │ │ Risk │ ││ │ Model │ │ Model │ │ Model │ │ Model │ ││ │ (Full) │ │ (Full) │ │ (Full) │ │ (Full) │ ││ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ ││ │ │ │ │ ││ Up/Down/Side σ forecast Trending/MR Low/Med/High ││ ││ Problem: 4x redundant feature learning, no shared knowledge │└──────────────────────────────────────────────────────────────────────────────┘The Task-Agnostic Solution
┌──────────────────────────────────────────────────────────────────────────────┐│ Task-Agnostic Approach: Shared Encoder │├──────────────────────────────────────────────────────────────────────────────┤│ ││ Raw Market Data ││ │ ││ ┌────▼──────────────────────────────────────────────────┐ ││ │ Universal Encoder (Shared) │ ││ │ Input → [64] → BN → ReLU → [32] → BN → ReLU → [16] │ ││ └────┬──────────────────────────────────────────────────┘ ││ │ Shared Representations ││ ├──────────────┬──────────────┬──────────────┐ ││ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ││ │ Trend │ │ Vol │ │ Regime │ │ Risk │ ││ │ Head │ │ Head │ │ Head │ │ Head │ ││ │ (Light) │ │ (Light) │ │ (Light) │ │ (Light) │ ││ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ ││ │ │ │ │ ││ └──────────────┴──────┬───────┴──────────────┘ ││ │ ││ Decision Fusion ││ │ ││ Unified Trading Signal ││ ││ Advantage: Shared features, cross-task knowledge transfer │└──────────────────────────────────────────────────────────────────────────────┘Key Benefits
| Aspect | Task-Specific | Task-Agnostic |
|---|---|---|
| Parameters | 4 × full model | 1 encoder + 4 light heads |
| Feature learning | Redundant | Shared |
| Cross-task transfer | None | Automatic |
| Inference time | 4 × forward pass | 1 encode + 4 head passes |
| Consistency | Independent signals | Coherent decisions |
| New tasks | Full retraining | Add a head |
Theoretical Foundation
Multi-Task Learning Framework
Given a set of $T$ trading tasks ${\mathcal{T}_1, \ldots, \mathcal{T}_T}$, we learn:
- A shared encoder $f_\theta: \mathbb{R}^d \to \mathbb{R}^k$ mapping input features to representations
- Task-specific heads $g_{\phi_t}: \mathbb{R}^k \to \mathbb{R}^{c_t}$ for each task $t$
The multi-task objective is:
$$\min_{\theta, {\phi_t}} \sum_{t=1}^{T} w_t \cdot \mathcal{L}t(g{\phi_t}(f_\theta(X)), Y_t)$$
where $w_t$ are task weights and $\mathcal{L}_t$ is the loss for task $t$.
Task-Agnostic Representations
A representation is task-agnostic if it captures the fundamental structure of the data without bias toward any particular downstream task. Formally:
$$I(Z; Y_t) \approx I(Z; Y_{t’}) \quad \forall t, t’ \in {1, \ldots, T}$$
where $I(Z; Y_t)$ is the mutual information between representation $Z$ and task labels $Y_t$.
This means the encoder extracts features equally useful for trend prediction, volatility forecasting, regime detection, and risk assessment.
Trading Tasks
We define four core trading tasks:
-
Trend Prediction (Classification: 3 classes)
- Up / Sideways / Down
- Loss: Cross-entropy
-
Volatility Forecast (Regression: 1 output)
- Predict next-period realized volatility
- Loss: Mean Squared Error
-
Regime Detection (Classification: 4 classes)
- Trending / Mean-Reverting / Volatile / Calm
- Loss: Cross-entropy
-
Risk Assessment (Classification: 3 classes)
- Low / Medium / High risk
- Loss: Cross-entropy
Architecture Design
Universal Encoder
The encoder is a feedforward neural network with:
- Batch Normalization after each hidden layer for stable training
- ReLU activations for non-linearity
- Residual connections when dimensions match
- L2 normalization of output representations to prevent collapse
Input (d=20 features) │ ▼┌─────────────────────┐│ Linear(20 → 64) ││ BatchNorm1d(64) ││ ReLU ││ Dropout(0.1) │└─────────┬───────────┘ │ ▼┌─────────────────────┐│ Linear(64 → 32) ││ BatchNorm1d(32) ││ ReLU ││ Dropout(0.1) │└─────────┬───────────┘ │ ▼┌─────────────────────┐│ Linear(32 → 16) ││ L2 Normalize │└─────────┬───────────┘ │ ▼ Representation (k=16)Task Heads
Each task head is a two-layer network:
class TaskHead(nn.Module): def __init__(self, repr_dim, hidden_dim, output_dim): self.hidden = Linear(repr_dim, hidden_dim) self.output = Linear(hidden_dim, output_dim)
def forward(self, z): h = relu(self.hidden(z)) out = self.output(h) return softmax(out) # or sigmoid for regressionFeature Extraction
We extract 19 task-agnostic features from OHLCV data:
| # | Feature | Category | Description |
|---|---|---|---|
| 1 | return_1d | Returns | 1-day price return |
| 2 | return_5d | Returns | 5-day cumulative return |
| 3 | rsi | Momentum | Relative Strength Index (normalized) |
| 4 | macd_signal | Momentum | MACD histogram |
| 5 | ma_crossover | Momentum | Moving average crossover signal |
| 6 | realized_vol | Volatility | Realized volatility (log returns std) |
| 7 | bb_position | Volatility | Position within Bollinger Bands |
| 8 | atr_ratio | Volatility | Average True Range / Price |
| 9 | volume_ratio | Volume | Current volume / 20-day average |
| 10 | volume_trend | Volume | Short-term vs long-term volume |
| 11 | body_ratio | Candle | Candle body / total range |
| 12 | upper_shadow | Candle | Upper shadow / total range |
| 13 | lower_shadow | Candle | Lower shadow / total range |
| 14 | range_pct | Range | High-low range as percentage |
| 15 | close_position | Range | Close position within range |
| 16 | trend_strength | Trend | Linear regression slope |
| 17 | trend_consistency | Trend | Proportion of up-days |
| 18 | return_skewness | Distribution | Skewness of recent returns |
| 19 | return_kurtosis | Distribution | Excess kurtosis of returns |
Multi-Task Learning for Trading
Training Process
┌─────────────────────────────────────────────────────────────────────────┐│ Multi-Task Training Loop │├─────────────────────────────────────────────────────────────────────────┤│ ││ For each epoch: ││ For each mini-batch: ││ ││ 1. Forward pass through encoder ││ features → representations ││ ││ 2. Forward pass through each task head ││ representations → predictions[task] ││ ││ 3. Compute losses for each task ││ L_trend = CrossEntropy(pred_trend, label_trend) ││ L_vol = MSE(pred_vol, label_vol) ││ L_regime = CrossEntropy(pred_regime, label_regime) ││ L_risk = CrossEntropy(pred_risk, label_risk) ││ ││ 4. Harmonize gradients ││ w_t = GradNorm(L_1, ..., L_T) ││ ││ 5. Compute total loss ││ L_total = Σ w_t * L_t ││ ││ 6. Backpropagate and update ││ θ ← θ - lr * ∇_θ L_total ││ φ_t ← φ_t - lr * ∇_{φ_t} L_total ││ ││ Early stopping on validation loss │└─────────────────────────────────────────────────────────────────────────┘Gradient Harmonization
A key challenge in multi-task learning is gradient conflict: gradients from different tasks may point in opposing directions, causing the encoder to oscillate or converge to a solution that favors one task over others.
GradNorm Algorithm
GradNorm dynamically adjusts task weights so all tasks train at similar rates:
- Track the loss ratio $r_t = L_t / L_t^{(0)}$ (current / initial)
- Compute relative training rate: $\tilde{r}_t = r_t / \bar{r}$
- Update weights: $w_t \propto \tilde{r}_t^{-\alpha}$
Tasks that train slower get higher weights, ensuring balanced progress.
PCGrad (Projected Conflicting Gradients)
For conflicting gradient pairs:
$$g_i^{PC} = g_i - \frac{g_i \cdot g_j}{|g_j|^2} g_j \quad \text{if } g_i \cdot g_j < 0$$
This projects away the conflicting component while preserving the cooperative component.
Uncertainty Weighting
Weight tasks by inverse prediction uncertainty:
$$w_t = \frac{1}{2\sigma_t^2}$$
where $\sigma_t$ is the learned uncertainty for task $t$. Uncertain tasks get less weight.
Decision Fusion
After obtaining predictions from all task heads, we fuse them into a unified trading decision.
Weighted Average Fusion
┌─────────────────────────────────────────────────────────────────────┐│ Decision Fusion Pipeline │├─────────────────────────────────────────────────────────────────────┤│ ││ Trend Head (w=0.35): ││ [Up: 0.6, Side: 0.3, Down: 0.1] ││ → signal += 0.35 * (0.6 - 0.1) = +0.175 ││ ││ Volatility Head (w=0.20): ││ [Predicted: 0.3] ││ → confidence *= (1 - 0.3*0.3) = 0.91 ││ → risk_level = 0.3 ││ ││ Regime Head (w=0.25): ││ [Trend: 0.5, MR: 0.2, Vol: 0.2, Calm: 0.1] ││ → regime = "Trending" ││ → signal *= 1.2 (boost for trending regime) ││ ││ Risk Head (w=0.20): ││ [Low: 0.6, Med: 0.3, High: 0.1] ││ → signal *= (1 - 0.1 * 0.5) = 0.95 ││ ││ Final: signal=+0.20, confidence=0.72, regime=Trending, risk=0.30 ││ → Signal Type: BUY │└─────────────────────────────────────────────────────────────────────┘Signal Classification
| Signal Value | Type | Position |
|---|---|---|
| ≥ 0.40 | Strong Buy | 100% long |
| ≥ 0.15 | Buy | 50% long |
| -0.15 to 0.15 | Hold | Flat |
| ≤ -0.15 | Sell | 50% short |
| ≤ -0.40 | Strong Sell | 100% short |
Implementation Strategy
Python (PyTorch)
from task_agnostic_trading import ( TaskAgnosticModel, EncoderConfig, MultiTaskTrainer, TrainerConfig, FeatureExtractor, DecisionFusion, SignalGenerator, BacktestEngine, BybitClient,)import asyncio
# 1. Fetch dataclient = BybitClient()klines = asyncio.run(client.fetch_klines("BTCUSDT", "60", 250))
# 2. Extract featuresextractor = FeatureExtractor()features = extractor.extract_all(klines)
# 3. Create modelconfig = EncoderConfig(input_dim=19, hidden_dims=[64, 32], repr_dim=16)model = TaskAgnosticModel(config)
# 4. Traintrainer = MultiTaskTrainer(TrainerConfig(epochs=100))result = trainer.train(model, features, labels)
# 5. Predict and fuseoutputs = model.predict_single(features)fusion = DecisionFusion()fused = fusion.fuse(outputs)
# 6. Generate signals and backtestsignals = SignalGenerator().generate(fused)bt = BacktestEngine().run(signals, prices)Rust
use task_agnostic_trading::prelude::*;
// Create modellet config = TaskAgnosticConfig::default() .with_input_dim(19) .with_repr_dim(16);let model = TaskAgnosticModel::new(config);
// Predictlet fused = model.predict(&features);
// Generate signalslet gen = SignalGenerator::new(SignalConfig::default());let signals = gen.generate(&fused);
// Backtestlet engine = BacktestEngine::new(BacktestConfig::default());let result = engine.run(&signals, &prices);println!("Sharpe: {:.2}", result.metrics.sharpe_ratio);Bybit Integration
The system supports cryptocurrency data via the Bybit exchange API:
client = BybitClient()
# Single symbolklines = await client.fetch_klines("BTCUSDT", interval="60", limit=200)
# Multiple symbols concurrentlymulti_data = await client.fetch_multi_klines( ["BTCUSDT", "ETHUSDT", "SOLUSDT"], interval="60", limit=200)Supported Data Types
- Klines (OHLCV): Candlestick data at various intervals
- Tickers: 24-hour market summaries
- Order Books: Bid/ask depth
- Funding Rates: Perpetual contract funding
Backtesting Framework
The backtesting engine simulates trading with:
- Transaction costs: Configurable per-trade cost (default 0.1%)
- Slippage: Execution price deviation (default 0.05%)
- Position sizing: Based on signal strength, confidence, and risk
Performance Metrics
| Metric | Description |
|---|---|
| Total Return | Cumulative portfolio return |
| Annualized Return | Geometric annual return |
| Sharpe Ratio | Risk-adjusted return (annualized) |
| Sortino Ratio | Downside risk-adjusted return |
| Max Drawdown | Largest peak-to-trough decline |
| Calmar Ratio | Return / Max Drawdown |
| Win Rate | Percentage of profitable trades |
| Profit Factor | Gross profit / Gross loss |
Performance Metrics
Model Evaluation
- Classification tasks: Cross-entropy loss, accuracy, F1-score
- Regression tasks: MSE, MAE, R²
- Representation quality: Inter-class / intra-class distance ratio
- Task balance: Weight variance across tasks (lower = more balanced)
Strategy Evaluation
- Risk-adjusted returns: Sharpe, Sortino, Calmar ratios
- Drawdown analysis: Maximum drawdown, recovery time
- Trade statistics: Win rate, profit factor, average trade
- Benchmark comparison: Strategy vs buy-and-hold
References
-
Task-Agnostic Representation Learning
- Lan et al., 2019
- URL: https://arxiv.org/abs/1907.12157
-
GradNorm: Gradient Normalization for Adaptive Loss Balancing
- Chen et al., 2018
- URL: https://arxiv.org/abs/1711.02257
-
Multi-Task Learning Using Uncertainty to Weigh Losses
- Kendall et al., 2018
- URL: https://arxiv.org/abs/1705.07115
-
Gradient Surgery for Multi-Task Learning (PCGrad)
- Yu et al., 2020
- URL: https://arxiv.org/abs/2001.06782
-
An Overview of Multi-Task Learning in Deep Neural Networks
- Ruder, 2017
- URL: https://arxiv.org/abs/1706.05098
Directory Structure
87_task_agnostic_trading/├── README.md # This file├── README.ru.md # Russian translation├── readme.simple.md # Simplified explanation (English)├── readme.simple.ru.md # Simplified explanation (Russian)├── README.specify.md # Task specification├── Cargo.toml # Rust project configuration├── Cargo.lock # Dependency lock file├── src/ # Rust source code│ ├── lib.rs # Library root│ ├── model/ # Model components│ │ ├── mod.rs # Module root│ │ ├── encoder.rs # Universal encoder│ │ ├── task_heads.rs # Task-specific heads│ │ └── fusion.rs # Decision fusion│ ├── data/ # Data handling│ │ ├── mod.rs # Module root│ │ ├── bybit.rs # Bybit API client│ │ ├── features.rs # Feature extraction│ │ └── types.rs # Data types│ ├── training/ # Training logic│ │ ├── mod.rs # Module root│ │ ├── multi_task.rs # Multi-task trainer│ │ └── gradient.rs # Gradient harmonization│ ├── strategy/ # Trading strategy│ │ ├── mod.rs # Module root│ │ ├── regime.rs # Regime classification│ │ └── signals.rs # Signal generation│ └── backtest/ # Backtesting│ ├── mod.rs # Module root│ └── engine.rs # Backtest engine├── examples/ # Rust examples│ ├── basic_task_agnostic.rs # Basic usage│ ├── multi_task_strategy.rs # Full strategy pipeline│ └── universal_encoder.rs # Encoder analysis└── python/ # Python implementation └── task_agnostic_trading.py # Complete Python moduleQuick Start
Python
cd 87_task_agnostic_trading/pythonpip install torch numpy aiohttppython task_agnostic_trading.pyRust
cd 87_task_agnostic_tradingcargo testcargo run --example basic_task_agnosticcargo run --example multi_task_strategycargo run --example universal_encoder