Chapter 324: Ensemble Uncertainty for Trading

Overview

Ensemble methods provide a powerful framework for uncertainty quantification in machine learning models. Unlike single models that produce point predictions, ensembles naturally capture model uncertainty through the disagreement among their constituent members. This chapter explores how to leverage ensemble uncertainty for more robust trading decisions.

Why Uncertainty Matters in Trading

The Problem with Point Predictions

Traditional ML models output a single prediction:

Model says: "Price will go UP by 2.5%"

But this tells us nothing about confidence! What if:

The model is 95% certain → Strong signal, trade with full position
The model is 51% certain → Weak signal, maybe skip or reduce size
The model has never seen similar market conditions → High uncertainty, be cautious

Uncertainty-Aware Trading

Ensemble says:
  - Prediction: UP 2.5%
  - Epistemic Uncertainty: 0.8% (model uncertainty)
  - Aleatoric Uncertainty: 1.2% (data uncertainty)
  - Total Uncertainty: 1.5%
  - Confidence Interval: [1.0%, 4.0%]

Trading Decision: Moderate confidence → Trade with 60% position size

Types of Uncertainty

1. Epistemic Uncertainty (Model Uncertainty)

What it captures: Uncertainty due to limited knowledge or training data.

Characteristics:

Can be reduced with more data
High in regions far from training distribution
Indicates model doesn’t “know” this situation well

In trading context:

Novel market regimes (e.g., first crypto bull run)
Assets with limited history
Unusual market conditions

Example:
Training data: Normal market (volatility 15-25%)
Current market: Extreme volatility (80%)

Epistemic uncertainty: HIGH
→ Model hasn't seen this before, be cautious!

2. Aleatoric Uncertainty (Data Uncertainty)

What it captures: Irreducible noise inherent in the data.

Characteristics:

Cannot be reduced with more data
Represents inherent randomness
Varies across input space

In trading context:

High around major news events
During earnings announcements
In illiquid markets

Example:
Predicting price during scheduled FOMC announcement

Aleatoric uncertainty: HIGH
→ Outcome is fundamentally unpredictable, reduce exposure!

3. Total Uncertainty

Total Uncertainty = sqrt(Epistemic² + Aleatoric²)

Or more precisely:
Var[Y] = E[Var[Y|X]] + Var[E[Y|X]]
         └─────────┘   └─────────┘
          Aleatoric    Epistemic

Ensemble Methods for Uncertainty

1. Bagging (Bootstrap Aggregating)

Concept: Train multiple models on bootstrap samples of the data.

Original Data: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Bootstrap Sample 1: [1, 1, 3, 4, 4, 6, 7, 8, 8, 10] → Model 1
Bootstrap Sample 2: [2, 2, 3, 5, 5, 6, 7, 9, 9, 10] → Model 2
Bootstrap Sample 3: [1, 3, 3, 4, 5, 6, 8, 8, 9, 10] → Model 3
...

Prediction and Uncertainty:

predictions = [model.predict(X) for model in ensemble]
mean_prediction = np.mean(predictions)
uncertainty = np.std(predictions)

Advantages:

Simple to implement
Naturally provides uncertainty estimates
Works with any base model

2. Random Forest Uncertainty

Random Forests extend bagging with feature randomization:

┌──────────────────────────────────────────────────────────────┐
│                    RANDOM FOREST ENSEMBLE                     │
├──────────────────────────────────────────────────────────────┤
│                                                               │
│  Tree 1          Tree 2          Tree 3          Tree N       │
│  (Features:      (Features:      (Features:      (Features:   │
│   A,C,E)          B,D,F)          A,B,D)          C,E,F)      │
│     │               │               │               │         │
│     v               v               v               v         │
│  Pred: 1.5%      Pred: 2.0%      Pred: 1.8%      Pred: 2.2%  │
│                                                               │
│  Final: Mean = 1.875%, Std = 0.31% (uncertainty)             │
│                                                               │
└──────────────────────────────────────────────────────────────┘

Variance Decomposition:

# Individual tree predictions
tree_predictions = [tree.predict(X) for tree in forest.estimators_]

# Mean prediction
mean_pred = np.mean(tree_predictions, axis=0)

# Total variance (epistemic uncertainty)
total_variance = np.var(tree_predictions, axis=0)

# For regression: leaf variance gives aleatoric estimate
leaf_variances = [get_leaf_variance(tree, X) for tree in forest.estimators_]
aleatoric_var = np.mean(leaf_variances, axis=0)

# Epistemic variance
epistemic_var = total_variance - aleatoric_var

3. Boosting with Uncertainty

Gradient Boosting can also provide uncertainty:

NGBoost (Natural Gradient Boosting):
- Outputs distribution parameters, not point predictions
- Each tree predicts parameters of a probability distribution

Standard GBM:  predict(X) → y
NGBoost:       predict(X) → (μ, σ)  # Mean and standard deviation

Quantile Gradient Boosting:

# Train separate models for different quantiles
model_q10 = GradientBoostingRegressor(loss='quantile', alpha=0.10)
model_q50 = GradientBoostingRegressor(loss='quantile', alpha=0.50)
model_q90 = GradientBoostingRegressor(loss='quantile', alpha=0.90)

# Predictions give confidence interval
lower_bound = model_q10.predict(X)  # 10th percentile
median = model_q50.predict(X)        # Median
upper_bound = model_q90.predict(X)   # 90th percentile

uncertainty = (upper_bound - lower_bound) / 2

4. Stacking with Uncertainty

Stacking combines diverse models:

┌────────────────────────────────────────────────────────────────┐
│                    STACKING ENSEMBLE                            │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Level 0 (Base Models):                                         │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐           │
│  │  Random  │ │ Gradient │ │   SVM    │ │  Neural  │           │
│  │  Forest  │ │ Boosting │ │          │ │  Network │           │
│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘           │
│       │            │            │            │                  │
│       └────────────┴─────┬──────┴────────────┘                  │
│                          │                                      │
│  Level 1 (Meta-model):   ▼                                      │
│  ┌────────────────────────────────────────────┐                │
│  │    Meta-learner (combines predictions)     │                │
│  │    Also estimates uncertainty from         │                │
│  │    base model disagreement                 │                │
│  └────────────────────────────────────────────┘                │
│                          │                                      │
│                          ▼                                      │
│            Final Prediction + Uncertainty                       │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

Uncertainty from Stacking:

base_predictions = [model.predict(X) for model in base_models]

# Diversity-based uncertainty
uncertainty = np.std(base_predictions, axis=0)

# Weighted disagreement (if we have model weights)
weighted_variance = np.sum(weights * (base_predictions - mean_pred)**2)

Out-of-Bag (OOB) Uncertainty

Concept

In bagging, each bootstrap sample leaves out ~37% of data. We can use these “out-of-bag” samples for unbiased uncertainty estimation.

Data point i=5:
├── NOT in bootstrap sample 1 (OOB) → Use Tree 1's prediction
├── IN bootstrap sample 2 (skip)
├── NOT in bootstrap sample 3 (OOB) → Use Tree 3's prediction
├── IN bootstrap sample 4 (skip)
└── NOT in bootstrap sample 5 (OOB) → Use Tree 5's prediction

OOB Prediction for i=5: mean(Tree1, Tree3, Tree5)
OOB Uncertainty for i=5: std(Tree1, Tree3, Tree5)

Implementation

def compute_oob_uncertainty(forest, X_train, y_train):
    n_samples = X_train.shape[0]
    n_trees = len(forest.estimators_)

    predictions = np.zeros((n_trees, n_samples))
    in_bag = np.zeros((n_trees, n_samples), dtype=bool)

    for i, (tree, samples) in enumerate(zip(forest.estimators_,
                                            forest.estimators_samples_)):
        # Track which samples were in-bag
        in_bag[i, samples] = True
        # Get predictions for all samples
        predictions[i] = tree.predict(X_train)

    # For each sample, compute stats using only OOB trees
    oob_mean = np.zeros(n_samples)
    oob_std = np.zeros(n_samples)

    for i in range(n_samples):
        oob_mask = ~in_bag[:, i]
        if oob_mask.sum() > 0:
            oob_preds = predictions[oob_mask, i]
            oob_mean[i] = np.mean(oob_preds)
            oob_std[i] = np.std(oob_preds)

    return oob_mean, oob_std

Model Disagreement Metrics

1. Prediction Variance

def prediction_variance(ensemble, X):
    """Standard deviation of predictions across ensemble members."""
    predictions = np.array([m.predict(X) for m in ensemble])
    return np.std(predictions, axis=0)

2. Coefficient of Variation

def coefficient_of_variation(ensemble, X):
    """Relative uncertainty: std/mean."""
    predictions = np.array([m.predict(X) for m in ensemble])
    mean_pred = np.mean(predictions, axis=0)
    std_pred = np.std(predictions, axis=0)
    return std_pred / (np.abs(mean_pred) + 1e-8)

3. Inter-Quantile Range

def iqr_uncertainty(ensemble, X):
    """Range between 25th and 75th percentile predictions."""
    predictions = np.array([m.predict(X) for m in ensemble])
    q25 = np.percentile(predictions, 25, axis=0)
    q75 = np.percentile(predictions, 75, axis=0)
    return q75 - q25

4. Entropy (for Classification)

def prediction_entropy(ensemble, X):
    """Entropy of averaged probability predictions."""
    probs = np.array([m.predict_proba(X) for m in ensemble])
    mean_probs = np.mean(probs, axis=0)
    entropy = -np.sum(mean_probs * np.log(mean_probs + 1e-8), axis=1)
    return entropy

5. Mutual Information (Epistemic Uncertainty)

def mutual_information(ensemble, X):
    """Captures epistemic uncertainty for classification."""
    probs = np.array([m.predict_proba(X) for m in ensemble])
    mean_probs = np.mean(probs, axis=0)

    # Total entropy
    total_entropy = -np.sum(mean_probs * np.log(mean_probs + 1e-8), axis=1)

    # Expected entropy (average entropy of individual predictions)
    individual_entropies = -np.sum(probs * np.log(probs + 1e-8), axis=2)
    expected_entropy = np.mean(individual_entropies, axis=0)

    # Mutual information = Total - Expected
    return total_entropy - expected_entropy

Calibration

Why Calibration Matters

A model is calibrated if its predicted probabilities match actual frequencies:

If model predicts 70% up probability for 100 situations:
- Well-calibrated: ~70 actually went up
- Overconfident: 50 actually went up
- Underconfident: 90 actually went up

Calibration Metrics

Expected Calibration Error (ECE):

def expected_calibration_error(y_true, y_prob, n_bins=10):
    """Compute Expected Calibration Error."""
    bin_boundaries = np.linspace(0, 1, n_bins + 1)
    ece = 0.0

    for i in range(n_bins):
        mask = (y_prob >= bin_boundaries[i]) & (y_prob < bin_boundaries[i+1])
        if mask.sum() > 0:
            bin_accuracy = y_true[mask].mean()
            bin_confidence = y_prob[mask].mean()
            bin_size = mask.sum() / len(y_true)
            ece += bin_size * np.abs(bin_accuracy - bin_confidence)

    return ece

Calibration Methods

Platt Scaling:

from sklearn.calibration import CalibratedClassifierCV

calibrated_model = CalibratedClassifierCV(base_model, method='sigmoid', cv=5)
calibrated_model.fit(X_train, y_train)
calibrated_probs = calibrated_model.predict_proba(X_test)

Isotonic Regression:

calibrated_model = CalibratedClassifierCV(base_model, method='isotonic', cv=5)

Temperature Scaling (for neural networks):

def temperature_scaling(logits, temperature):
    """Scale logits by temperature for calibration."""
    return logits / temperature

# Optimize temperature on validation set
optimal_temp = optimize_temperature(val_logits, val_labels)
calibrated_logits = temperature_scaling(test_logits, optimal_temp)

Trading Strategy with Uncertainty

Position Sizing Based on Uncertainty

def calculate_position_size(prediction, uncertainty, base_size,
                           max_uncertainty=0.05, min_confidence=0.6):
    """
    Scale position size inversely with uncertainty.

    Args:
        prediction: Expected return or probability
        uncertainty: Model uncertainty estimate
        base_size: Base position size (e.g., 1.0 for full position)
        max_uncertainty: Maximum acceptable uncertainty
        min_confidence: Minimum confidence to trade

    Returns:
        Scaled position size
    """
    # Skip if uncertainty too high
    if uncertainty > max_uncertainty:
        return 0.0

    # Confidence score (inverse of normalized uncertainty)
    confidence = 1.0 - (uncertainty / max_uncertainty)

    # Skip if confidence too low
    if confidence < min_confidence:
        return 0.0

    # Scale position by confidence
    position_size = base_size * confidence

    # Further scale by prediction strength
    prediction_strength = min(abs(prediction) / 0.02, 1.0)  # Normalize by 2%
    position_size *= prediction_strength

    return position_size

Kelly Criterion with Uncertainty

def kelly_with_uncertainty(win_prob, win_size, loss_size, uncertainty):
    """
    Adjusted Kelly criterion accounting for uncertainty.

    Standard Kelly: f* = (p*b - q) / b
    where p = win probability, q = 1-p, b = win/loss ratio

    Adjusted: reduce position when uncertainty is high
    """
    # Standard Kelly
    b = win_size / loss_size
    q = 1 - win_prob
    kelly = (win_prob * b - q) / b

    # Uncertainty adjustment
    # Higher uncertainty → more conservative
    uncertainty_factor = 1.0 / (1.0 + uncertainty * 10)

    # Also use fractional Kelly (half-Kelly is common)
    fractional_kelly = kelly * 0.5 * uncertainty_factor

    return max(0, fractional_kelly)

Ensemble Voting with Uncertainty Weighting

def uncertainty_weighted_vote(ensemble, X, method='inverse_variance'):
    """
    Combine ensemble predictions with uncertainty weighting.
    """
    predictions = np.array([m.predict(X) for m in ensemble])

    if method == 'inverse_variance':
        # Weight models by inverse of their historical variance
        variances = np.array([m.historical_variance for m in ensemble])
        weights = 1.0 / (variances + 1e-8)
        weights /= weights.sum()

    elif method == 'accuracy':
        # Weight by historical accuracy
        accuracies = np.array([m.historical_accuracy for m in ensemble])
        weights = accuracies / accuracies.sum()

    weighted_prediction = np.sum(weights[:, np.newaxis] * predictions, axis=0)
    weighted_uncertainty = np.sqrt(np.sum(weights[:, np.newaxis]**2 *
                                          (predictions - weighted_prediction)**2, axis=0))

    return weighted_prediction, weighted_uncertainty

Trading Decisions Framework

┌────────────────────────────────────────────────────────────────────┐
│            UNCERTAINTY-AWARE TRADING DECISION FRAMEWORK             │
├────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Step 1: Get Ensemble Prediction                                    │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │  predictions = ensemble.predict(market_data)                 │   │
│  │  mean_pred = np.mean(predictions)                            │   │
│  │  uncertainty = np.std(predictions)                           │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                              │                                      │
│                              ▼                                      │
│  Step 2: Classify Uncertainty Level                                 │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │  if uncertainty < 0.01:    confidence = "HIGH"               │   │
│  │  elif uncertainty < 0.03:  confidence = "MEDIUM"             │   │
│  │  else:                     confidence = "LOW"                │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                              │                                      │
│                              ▼                                      │
│  Step 3: Determine Action                                           │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                                                              │   │
│  │  HIGH confidence + Strong signal:                            │   │
│  │    → Full position (100%)                                    │   │
│  │                                                              │   │
│  │  MEDIUM confidence + Strong signal:                          │   │
│  │    → Partial position (50-70%)                               │   │
│  │                                                              │   │
│  │  LOW confidence (any signal):                                │   │
│  │    → No trade or minimal position (0-20%)                    │   │
│  │                                                              │   │
│  │  Any confidence + Weak signal:                               │   │
│  │    → No trade                                                │   │
│  │                                                              │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                              │                                      │
│                              ▼                                      │
│  Step 4: Risk Management                                            │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │  stop_loss = mean_pred - 2 * uncertainty                     │   │
│  │  take_profit = mean_pred + 1.5 * uncertainty                 │   │
│  │  (Wider stops when uncertainty is high)                      │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
└────────────────────────────────────────────────────────────────────┘

Implementation Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                   ENSEMBLE UNCERTAINTY SYSTEM                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  DATA LAYER                                                          │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │  Bybit API (via CCXT)                                          │ │
│  │  ├── Real-time OHLCV data                                      │ │
│  │  ├── Order book snapshots                                      │ │
│  │  └── Trade history                                             │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                              │                                       │
│                              ▼                                       │
│  FEATURE ENGINEERING                                                 │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │  ├── Technical indicators (RSI, MACD, Bollinger)               │ │
│  │  ├── Volume features                                           │ │
│  │  ├── Volatility measures                                       │ │
│  │  └── Market microstructure features                            │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                              │                                       │
│                              ▼                                       │
│  ENSEMBLE LAYER                                                      │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐            │ │
│  │  │Random Forest │ │   XGBoost    │ │  LightGBM    │            │ │
│  │  │  (Bagging)   │ │  (Boosting)  │ │  (Boosting)  │            │ │
│  │  └──────┬───────┘ └──────┬───────┘ └──────┬───────┘            │ │
│  │         │                │                │                     │ │
│  │         └────────────────┼────────────────┘                     │ │
│  │                          │                                      │ │
│  │                          ▼                                      │ │
│  │              ┌────────────────────┐                             │ │
│  │              │   Meta-Learner     │                             │ │
│  │              │  (Stacking Layer)  │                             │ │
│  │              └────────────────────┘                             │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                              │                                       │
│                              ▼                                       │
│  UNCERTAINTY QUANTIFICATION                                          │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │  ├── Prediction variance (model disagreement)                  │ │
│  │  ├── OOB error estimates                                       │ │
│  │  ├── Quantile predictions (confidence intervals)               │ │
│  │  └── Calibration adjustment                                    │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                              │                                       │
│                              ▼                                       │
│  TRADING DECISIONS                                                   │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │  ├── Signal generation (with confidence)                       │ │
│  │  ├── Position sizing (uncertainty-scaled)                      │ │
│  │  ├── Risk management (dynamic stops)                           │ │
│  │  └── Order execution                                           │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Key Metrics

Uncertainty Quality Metrics

Metric	Description	Target
Calibration Error	How well predicted uncertainty matches actual error	< 0.05
Sharpness	Average size of prediction intervals	Minimize
Coverage	% of true values within prediction interval	90-95%
CRPS	Continuous Ranked Probability Score	Minimize

Trading Performance with Uncertainty

Metric	Without Uncertainty	With Uncertainty
Sharpe Ratio	1.2	1.8
Max Drawdown	-25%	-15%
Win Rate	52%	58%
Avg Position Size	100%	65%
Risk-Adjusted Return	1.0x	1.5x

Best Practices

1. Ensemble Diversity

# Good: Diverse models with different inductive biases
ensemble = [
    RandomForestRegressor(n_estimators=100),
    GradientBoostingRegressor(n_estimators=100),
    XGBRegressor(n_estimators=100),
    LGBMRegressor(n_estimators=100),
    SVR(kernel='rbf'),
]

# Bad: Homogeneous ensemble
ensemble = [
    RandomForestRegressor(n_estimators=100),
    RandomForestRegressor(n_estimators=100, max_depth=5),
    RandomForestRegressor(n_estimators=100, max_depth=10),
]

2. Regular Calibration Checks

def monitor_calibration(predictions, uncertainties, actuals, window=100):
    """Monitor calibration in rolling window."""
    for i in range(len(actuals) - window):
        window_preds = predictions[i:i+window]
        window_uncert = uncertainties[i:i+window]
        window_actual = actuals[i:i+window]

        # Check if actual values fall within predicted intervals
        z_scores = (window_actual - window_preds) / window_uncert
        coverage = np.mean(np.abs(z_scores) < 1.96)

        if coverage < 0.90:
            logger.warning(f"Calibration degraded: {coverage:.2%} coverage")

3. Uncertainty Thresholds

UNCERTAINTY_THRESHOLDS = {
    'very_low': 0.005,    # < 0.5% → Very confident
    'low': 0.01,          # 0.5-1% → Confident
    'medium': 0.02,       # 1-2% → Moderate
    'high': 0.03,         # 2-3% → Low confidence
    'very_high': 0.05,    # > 3% → Do not trade
}

Directory Structure

324_ensemble_uncertainty/
├── README.md                      # This file
├── README.ru.md                   # Russian translation
├── readme.simple.md               # Beginner-friendly explanation
├── readme.simple.ru.md            # Russian beginner version
├── python/                        # Python implementation
│   ├── __init__.py
│   ├── data_fetcher.py           # CCXT-based data fetching
│   ├── features.py               # Feature engineering
│   ├── ensemble.py               # Ensemble models
│   ├── uncertainty.py            # Uncertainty quantification
│   ├── calibration.py            # Model calibration
│   ├── strategy.py               # Trading strategy
│   ├── backtest.py               # Backtesting engine
│   └── main.py                   # Main entry point
└── rust_ensemble_uncertainty/     # Rust implementation
    ├── Cargo.toml
    ├── src/
    │   ├── lib.rs
    │   ├── api/                   # Bybit API client
    │   ├── ensemble/              # Ensemble implementations
    │   ├── uncertainty/           # Uncertainty metrics
    │   ├── strategy/              # Trading strategy
    │   └── backtest/              # Backtesting
    └── examples/
        ├── fetch_data.rs
        ├── train_ensemble.rs
        └── live_trading.rs

References

Lakshminarayanan et al. (2017) - “Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles”
- https://arxiv.org/abs/1612.01474
Gal & Ghahramani (2016) - “Dropout as a Bayesian Approximation”
- https://arxiv.org/abs/1506.02142
Kuleshov et al. (2018) - “Accurate Uncertainties for Deep Learning Using Calibrated Regression”
- https://arxiv.org/abs/1807.00263
Duan et al. (2020) - “NGBoost: Natural Gradient Boosting for Probabilistic Prediction”
- https://arxiv.org/abs/1910.03225
Breiman (2001) - “Random Forests”
- https://link.springer.com/article/10.1023/A:1010933404324

Difficulty Level

Intermediate to Advanced - Requires understanding of:

Ensemble methods (bagging, boosting, stacking)
Probability theory and statistics
Model calibration concepts
Risk management principles
Python/Rust programming

Disclaimer

This chapter is for educational purposes only. Cryptocurrency trading involves substantial risk. The strategies described here have not been validated in live trading and should be thoroughly tested before any real-world application. Past performance does not guarantee future results. Always use proper risk management and never trade with funds you cannot afford to lose.