Skip to content

Chapter 324: Ensemble Uncertainty for Trading

Chapter 324: Ensemble Uncertainty for Trading

Overview

Ensemble methods provide a powerful framework for uncertainty quantification in machine learning models. Unlike single models that produce point predictions, ensembles naturally capture model uncertainty through the disagreement among their constituent members. This chapter explores how to leverage ensemble uncertainty for more robust trading decisions.

Why Uncertainty Matters in Trading

The Problem with Point Predictions

Traditional ML models output a single prediction:

Model says: "Price will go UP by 2.5%"

But this tells us nothing about confidence! What if:

  • The model is 95% certain → Strong signal, trade with full position
  • The model is 51% certain → Weak signal, maybe skip or reduce size
  • The model has never seen similar market conditions → High uncertainty, be cautious

Uncertainty-Aware Trading

Ensemble says:
- Prediction: UP 2.5%
- Epistemic Uncertainty: 0.8% (model uncertainty)
- Aleatoric Uncertainty: 1.2% (data uncertainty)
- Total Uncertainty: 1.5%
- Confidence Interval: [1.0%, 4.0%]
Trading Decision: Moderate confidence → Trade with 60% position size

Types of Uncertainty

1. Epistemic Uncertainty (Model Uncertainty)

What it captures: Uncertainty due to limited knowledge or training data.

Characteristics:

  • Can be reduced with more data
  • High in regions far from training distribution
  • Indicates model doesn’t “know” this situation well

In trading context:

  • Novel market regimes (e.g., first crypto bull run)
  • Assets with limited history
  • Unusual market conditions
Example:
Training data: Normal market (volatility 15-25%)
Current market: Extreme volatility (80%)
Epistemic uncertainty: HIGH
→ Model hasn't seen this before, be cautious!

2. Aleatoric Uncertainty (Data Uncertainty)

What it captures: Irreducible noise inherent in the data.

Characteristics:

  • Cannot be reduced with more data
  • Represents inherent randomness
  • Varies across input space

In trading context:

  • High around major news events
  • During earnings announcements
  • In illiquid markets
Example:
Predicting price during scheduled FOMC announcement
Aleatoric uncertainty: HIGH
→ Outcome is fundamentally unpredictable, reduce exposure!

3. Total Uncertainty

Total Uncertainty = sqrt(Epistemic² + Aleatoric²)
Or more precisely:
Var[Y] = E[Var[Y|X]] + Var[E[Y|X]]
└─────────┘ └─────────┘
Aleatoric Epistemic

Ensemble Methods for Uncertainty

1. Bagging (Bootstrap Aggregating)

Concept: Train multiple models on bootstrap samples of the data.

Original Data: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Bootstrap Sample 1: [1, 1, 3, 4, 4, 6, 7, 8, 8, 10] → Model 1
Bootstrap Sample 2: [2, 2, 3, 5, 5, 6, 7, 9, 9, 10] → Model 2
Bootstrap Sample 3: [1, 3, 3, 4, 5, 6, 8, 8, 9, 10] → Model 3
...

Prediction and Uncertainty:

predictions = [model.predict(X) for model in ensemble]
mean_prediction = np.mean(predictions)
uncertainty = np.std(predictions)

Advantages:

  • Simple to implement
  • Naturally provides uncertainty estimates
  • Works with any base model

2. Random Forest Uncertainty

Random Forests extend bagging with feature randomization:

┌──────────────────────────────────────────────────────────────┐
│ RANDOM FOREST ENSEMBLE │
├──────────────────────────────────────────────────────────────┤
│ │
│ Tree 1 Tree 2 Tree 3 Tree N │
│ (Features: (Features: (Features: (Features: │
│ A,C,E) B,D,F) A,B,D) C,E,F) │
│ │ │ │ │ │
│ v v v v │
│ Pred: 1.5% Pred: 2.0% Pred: 1.8% Pred: 2.2% │
│ │
│ Final: Mean = 1.875%, Std = 0.31% (uncertainty) │
│ │
└──────────────────────────────────────────────────────────────┘

Variance Decomposition:

# Individual tree predictions
tree_predictions = [tree.predict(X) for tree in forest.estimators_]
# Mean prediction
mean_pred = np.mean(tree_predictions, axis=0)
# Total variance (epistemic uncertainty)
total_variance = np.var(tree_predictions, axis=0)
# For regression: leaf variance gives aleatoric estimate
leaf_variances = [get_leaf_variance(tree, X) for tree in forest.estimators_]
aleatoric_var = np.mean(leaf_variances, axis=0)
# Epistemic variance
epistemic_var = total_variance - aleatoric_var

3. Boosting with Uncertainty

Gradient Boosting can also provide uncertainty:

NGBoost (Natural Gradient Boosting):
- Outputs distribution parameters, not point predictions
- Each tree predicts parameters of a probability distribution
Standard GBM: predict(X) → y
NGBoost: predict(X) → (μ, σ) # Mean and standard deviation

Quantile Gradient Boosting:

# Train separate models for different quantiles
model_q10 = GradientBoostingRegressor(loss='quantile', alpha=0.10)
model_q50 = GradientBoostingRegressor(loss='quantile', alpha=0.50)
model_q90 = GradientBoostingRegressor(loss='quantile', alpha=0.90)
# Predictions give confidence interval
lower_bound = model_q10.predict(X) # 10th percentile
median = model_q50.predict(X) # Median
upper_bound = model_q90.predict(X) # 90th percentile
uncertainty = (upper_bound - lower_bound) / 2

4. Stacking with Uncertainty

Stacking combines diverse models:

┌────────────────────────────────────────────────────────────────┐
│ STACKING ENSEMBLE │
├────────────────────────────────────────────────────────────────┤
│ │
│ Level 0 (Base Models): │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Random │ │ Gradient │ │ SVM │ │ Neural │ │
│ │ Forest │ │ Boosting │ │ │ │ Network │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │ │
│ └────────────┴─────┬──────┴────────────┘ │
│ │ │
│ Level 1 (Meta-model): ▼ │
│ ┌────────────────────────────────────────────┐ │
│ │ Meta-learner (combines predictions) │ │
│ │ Also estimates uncertainty from │ │
│ │ base model disagreement │ │
│ └────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Final Prediction + Uncertainty │
│ │
└────────────────────────────────────────────────────────────────┘

Uncertainty from Stacking:

base_predictions = [model.predict(X) for model in base_models]
# Diversity-based uncertainty
uncertainty = np.std(base_predictions, axis=0)
# Weighted disagreement (if we have model weights)
weighted_variance = np.sum(weights * (base_predictions - mean_pred)**2)

Out-of-Bag (OOB) Uncertainty

Concept

In bagging, each bootstrap sample leaves out ~37% of data. We can use these “out-of-bag” samples for unbiased uncertainty estimation.

Data point i=5:
├── NOT in bootstrap sample 1 (OOB) → Use Tree 1's prediction
├── IN bootstrap sample 2 (skip)
├── NOT in bootstrap sample 3 (OOB) → Use Tree 3's prediction
├── IN bootstrap sample 4 (skip)
└── NOT in bootstrap sample 5 (OOB) → Use Tree 5's prediction
OOB Prediction for i=5: mean(Tree1, Tree3, Tree5)
OOB Uncertainty for i=5: std(Tree1, Tree3, Tree5)

Implementation

def compute_oob_uncertainty(forest, X_train, y_train):
n_samples = X_train.shape[0]
n_trees = len(forest.estimators_)
predictions = np.zeros((n_trees, n_samples))
in_bag = np.zeros((n_trees, n_samples), dtype=bool)
for i, (tree, samples) in enumerate(zip(forest.estimators_,
forest.estimators_samples_)):
# Track which samples were in-bag
in_bag[i, samples] = True
# Get predictions for all samples
predictions[i] = tree.predict(X_train)
# For each sample, compute stats using only OOB trees
oob_mean = np.zeros(n_samples)
oob_std = np.zeros(n_samples)
for i in range(n_samples):
oob_mask = ~in_bag[:, i]
if oob_mask.sum() > 0:
oob_preds = predictions[oob_mask, i]
oob_mean[i] = np.mean(oob_preds)
oob_std[i] = np.std(oob_preds)
return oob_mean, oob_std

Model Disagreement Metrics

1. Prediction Variance

def prediction_variance(ensemble, X):
"""Standard deviation of predictions across ensemble members."""
predictions = np.array([m.predict(X) for m in ensemble])
return np.std(predictions, axis=0)

2. Coefficient of Variation

def coefficient_of_variation(ensemble, X):
"""Relative uncertainty: std/mean."""
predictions = np.array([m.predict(X) for m in ensemble])
mean_pred = np.mean(predictions, axis=0)
std_pred = np.std(predictions, axis=0)
return std_pred / (np.abs(mean_pred) + 1e-8)

3. Inter-Quantile Range

def iqr_uncertainty(ensemble, X):
"""Range between 25th and 75th percentile predictions."""
predictions = np.array([m.predict(X) for m in ensemble])
q25 = np.percentile(predictions, 25, axis=0)
q75 = np.percentile(predictions, 75, axis=0)
return q75 - q25

4. Entropy (for Classification)

def prediction_entropy(ensemble, X):
"""Entropy of averaged probability predictions."""
probs = np.array([m.predict_proba(X) for m in ensemble])
mean_probs = np.mean(probs, axis=0)
entropy = -np.sum(mean_probs * np.log(mean_probs + 1e-8), axis=1)
return entropy

5. Mutual Information (Epistemic Uncertainty)

def mutual_information(ensemble, X):
"""Captures epistemic uncertainty for classification."""
probs = np.array([m.predict_proba(X) for m in ensemble])
mean_probs = np.mean(probs, axis=0)
# Total entropy
total_entropy = -np.sum(mean_probs * np.log(mean_probs + 1e-8), axis=1)
# Expected entropy (average entropy of individual predictions)
individual_entropies = -np.sum(probs * np.log(probs + 1e-8), axis=2)
expected_entropy = np.mean(individual_entropies, axis=0)
# Mutual information = Total - Expected
return total_entropy - expected_entropy

Calibration

Why Calibration Matters

A model is calibrated if its predicted probabilities match actual frequencies:

If model predicts 70% up probability for 100 situations:
- Well-calibrated: ~70 actually went up
- Overconfident: 50 actually went up
- Underconfident: 90 actually went up

Calibration Metrics

Expected Calibration Error (ECE):

def expected_calibration_error(y_true, y_prob, n_bins=10):
"""Compute Expected Calibration Error."""
bin_boundaries = np.linspace(0, 1, n_bins + 1)
ece = 0.0
for i in range(n_bins):
mask = (y_prob >= bin_boundaries[i]) & (y_prob < bin_boundaries[i+1])
if mask.sum() > 0:
bin_accuracy = y_true[mask].mean()
bin_confidence = y_prob[mask].mean()
bin_size = mask.sum() / len(y_true)
ece += bin_size * np.abs(bin_accuracy - bin_confidence)
return ece

Calibration Methods

Platt Scaling:

from sklearn.calibration import CalibratedClassifierCV
calibrated_model = CalibratedClassifierCV(base_model, method='sigmoid', cv=5)
calibrated_model.fit(X_train, y_train)
calibrated_probs = calibrated_model.predict_proba(X_test)

Isotonic Regression:

calibrated_model = CalibratedClassifierCV(base_model, method='isotonic', cv=5)

Temperature Scaling (for neural networks):

def temperature_scaling(logits, temperature):
"""Scale logits by temperature for calibration."""
return logits / temperature
# Optimize temperature on validation set
optimal_temp = optimize_temperature(val_logits, val_labels)
calibrated_logits = temperature_scaling(test_logits, optimal_temp)

Trading Strategy with Uncertainty

Position Sizing Based on Uncertainty

def calculate_position_size(prediction, uncertainty, base_size,
max_uncertainty=0.05, min_confidence=0.6):
"""
Scale position size inversely with uncertainty.
Args:
prediction: Expected return or probability
uncertainty: Model uncertainty estimate
base_size: Base position size (e.g., 1.0 for full position)
max_uncertainty: Maximum acceptable uncertainty
min_confidence: Minimum confidence to trade
Returns:
Scaled position size
"""
# Skip if uncertainty too high
if uncertainty > max_uncertainty:
return 0.0
# Confidence score (inverse of normalized uncertainty)
confidence = 1.0 - (uncertainty / max_uncertainty)
# Skip if confidence too low
if confidence < min_confidence:
return 0.0
# Scale position by confidence
position_size = base_size * confidence
# Further scale by prediction strength
prediction_strength = min(abs(prediction) / 0.02, 1.0) # Normalize by 2%
position_size *= prediction_strength
return position_size

Kelly Criterion with Uncertainty

def kelly_with_uncertainty(win_prob, win_size, loss_size, uncertainty):
"""
Adjusted Kelly criterion accounting for uncertainty.
Standard Kelly: f* = (p*b - q) / b
where p = win probability, q = 1-p, b = win/loss ratio
Adjusted: reduce position when uncertainty is high
"""
# Standard Kelly
b = win_size / loss_size
q = 1 - win_prob
kelly = (win_prob * b - q) / b
# Uncertainty adjustment
# Higher uncertainty → more conservative
uncertainty_factor = 1.0 / (1.0 + uncertainty * 10)
# Also use fractional Kelly (half-Kelly is common)
fractional_kelly = kelly * 0.5 * uncertainty_factor
return max(0, fractional_kelly)

Ensemble Voting with Uncertainty Weighting

def uncertainty_weighted_vote(ensemble, X, method='inverse_variance'):
"""
Combine ensemble predictions with uncertainty weighting.
"""
predictions = np.array([m.predict(X) for m in ensemble])
if method == 'inverse_variance':
# Weight models by inverse of their historical variance
variances = np.array([m.historical_variance for m in ensemble])
weights = 1.0 / (variances + 1e-8)
weights /= weights.sum()
elif method == 'accuracy':
# Weight by historical accuracy
accuracies = np.array([m.historical_accuracy for m in ensemble])
weights = accuracies / accuracies.sum()
weighted_prediction = np.sum(weights[:, np.newaxis] * predictions, axis=0)
weighted_uncertainty = np.sqrt(np.sum(weights[:, np.newaxis]**2 *
(predictions - weighted_prediction)**2, axis=0))
return weighted_prediction, weighted_uncertainty

Trading Decisions Framework

┌────────────────────────────────────────────────────────────────────┐
│ UNCERTAINTY-AWARE TRADING DECISION FRAMEWORK │
├────────────────────────────────────────────────────────────────────┤
│ │
│ Step 1: Get Ensemble Prediction │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ predictions = ensemble.predict(market_data) │ │
│ │ mean_pred = np.mean(predictions) │ │
│ │ uncertainty = np.std(predictions) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Step 2: Classify Uncertainty Level │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ if uncertainty < 0.01: confidence = "HIGH" │ │
│ │ elif uncertainty < 0.03: confidence = "MEDIUM" │ │
│ │ else: confidence = "LOW" │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Step 3: Determine Action │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ HIGH confidence + Strong signal: │ │
│ │ → Full position (100%) │ │
│ │ │ │
│ │ MEDIUM confidence + Strong signal: │ │
│ │ → Partial position (50-70%) │ │
│ │ │ │
│ │ LOW confidence (any signal): │ │
│ │ → No trade or minimal position (0-20%) │ │
│ │ │ │
│ │ Any confidence + Weak signal: │ │
│ │ → No trade │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Step 4: Risk Management │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ stop_loss = mean_pred - 2 * uncertainty │ │
│ │ take_profit = mean_pred + 1.5 * uncertainty │ │
│ │ (Wider stops when uncertainty is high) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────┘

Implementation Architecture

┌─────────────────────────────────────────────────────────────────────┐
│ ENSEMBLE UNCERTAINTY SYSTEM │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ DATA LAYER │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Bybit API (via CCXT) │ │
│ │ ├── Real-time OHLCV data │ │
│ │ ├── Order book snapshots │ │
│ │ └── Trade history │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ FEATURE ENGINEERING │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ ├── Technical indicators (RSI, MACD, Bollinger) │ │
│ │ ├── Volume features │ │
│ │ ├── Volatility measures │ │
│ │ └── Market microstructure features │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ENSEMBLE LAYER │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │Random Forest │ │ XGBoost │ │ LightGBM │ │ │
│ │ │ (Bagging) │ │ (Boosting) │ │ (Boosting) │ │ │
│ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │
│ │ │ │ │ │ │
│ │ └────────────────┼────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌────────────────────┐ │ │
│ │ │ Meta-Learner │ │ │
│ │ │ (Stacking Layer) │ │ │
│ │ └────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ UNCERTAINTY QUANTIFICATION │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ ├── Prediction variance (model disagreement) │ │
│ │ ├── OOB error estimates │ │
│ │ ├── Quantile predictions (confidence intervals) │ │
│ │ └── Calibration adjustment │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ TRADING DECISIONS │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ ├── Signal generation (with confidence) │ │
│ │ ├── Position sizing (uncertainty-scaled) │ │
│ │ ├── Risk management (dynamic stops) │ │
│ │ └── Order execution │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘

Key Metrics

Uncertainty Quality Metrics

MetricDescriptionTarget
Calibration ErrorHow well predicted uncertainty matches actual error< 0.05
SharpnessAverage size of prediction intervalsMinimize
Coverage% of true values within prediction interval90-95%
CRPSContinuous Ranked Probability ScoreMinimize

Trading Performance with Uncertainty

MetricWithout UncertaintyWith Uncertainty
Sharpe Ratio1.21.8
Max Drawdown-25%-15%
Win Rate52%58%
Avg Position Size100%65%
Risk-Adjusted Return1.0x1.5x

Best Practices

1. Ensemble Diversity

# Good: Diverse models with different inductive biases
ensemble = [
RandomForestRegressor(n_estimators=100),
GradientBoostingRegressor(n_estimators=100),
XGBRegressor(n_estimators=100),
LGBMRegressor(n_estimators=100),
SVR(kernel='rbf'),
]
# Bad: Homogeneous ensemble
ensemble = [
RandomForestRegressor(n_estimators=100),
RandomForestRegressor(n_estimators=100, max_depth=5),
RandomForestRegressor(n_estimators=100, max_depth=10),
]

2. Regular Calibration Checks

def monitor_calibration(predictions, uncertainties, actuals, window=100):
"""Monitor calibration in rolling window."""
for i in range(len(actuals) - window):
window_preds = predictions[i:i+window]
window_uncert = uncertainties[i:i+window]
window_actual = actuals[i:i+window]
# Check if actual values fall within predicted intervals
z_scores = (window_actual - window_preds) / window_uncert
coverage = np.mean(np.abs(z_scores) < 1.96)
if coverage < 0.90:
logger.warning(f"Calibration degraded: {coverage:.2%} coverage")

3. Uncertainty Thresholds

UNCERTAINTY_THRESHOLDS = {
'very_low': 0.005, # < 0.5% → Very confident
'low': 0.01, # 0.5-1% → Confident
'medium': 0.02, # 1-2% → Moderate
'high': 0.03, # 2-3% → Low confidence
'very_high': 0.05, # > 3% → Do not trade
}

Directory Structure

324_ensemble_uncertainty/
├── README.md # This file
├── README.ru.md # Russian translation
├── readme.simple.md # Beginner-friendly explanation
├── readme.simple.ru.md # Russian beginner version
├── python/ # Python implementation
│ ├── __init__.py
│ ├── data_fetcher.py # CCXT-based data fetching
│ ├── features.py # Feature engineering
│ ├── ensemble.py # Ensemble models
│ ├── uncertainty.py # Uncertainty quantification
│ ├── calibration.py # Model calibration
│ ├── strategy.py # Trading strategy
│ ├── backtest.py # Backtesting engine
│ └── main.py # Main entry point
└── rust_ensemble_uncertainty/ # Rust implementation
├── Cargo.toml
├── src/
│ ├── lib.rs
│ ├── api/ # Bybit API client
│ ├── ensemble/ # Ensemble implementations
│ ├── uncertainty/ # Uncertainty metrics
│ ├── strategy/ # Trading strategy
│ └── backtest/ # Backtesting
└── examples/
├── fetch_data.rs
├── train_ensemble.rs
└── live_trading.rs

References

  1. Lakshminarayanan et al. (2017) - “Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles”

  2. Gal & Ghahramani (2016) - “Dropout as a Bayesian Approximation”

  3. Kuleshov et al. (2018) - “Accurate Uncertainties for Deep Learning Using Calibrated Regression”

  4. Duan et al. (2020) - “NGBoost: Natural Gradient Boosting for Probabilistic Prediction”

  5. Breiman (2001) - “Random Forests”

Difficulty Level

Intermediate to Advanced - Requires understanding of:

  • Ensemble methods (bagging, boosting, stacking)
  • Probability theory and statistics
  • Model calibration concepts
  • Risk management principles
  • Python/Rust programming

Disclaimer

This chapter is for educational purposes only. Cryptocurrency trading involves substantial risk. The strategies described here have not been validated in live trading and should be thoroughly tested before any real-world application. Past performance does not guarantee future results. Always use proper risk management and never trade with funds you cannot afford to lose.