Chapter 324: Ensemble Uncertainty for Trading
Chapter 324: Ensemble Uncertainty for Trading
Overview
Ensemble methods provide a powerful framework for uncertainty quantification in machine learning models. Unlike single models that produce point predictions, ensembles naturally capture model uncertainty through the disagreement among their constituent members. This chapter explores how to leverage ensemble uncertainty for more robust trading decisions.
Why Uncertainty Matters in Trading
The Problem with Point Predictions
Traditional ML models output a single prediction:
Model says: "Price will go UP by 2.5%"But this tells us nothing about confidence! What if:
- The model is 95% certain → Strong signal, trade with full position
- The model is 51% certain → Weak signal, maybe skip or reduce size
- The model has never seen similar market conditions → High uncertainty, be cautious
Uncertainty-Aware Trading
Ensemble says: - Prediction: UP 2.5% - Epistemic Uncertainty: 0.8% (model uncertainty) - Aleatoric Uncertainty: 1.2% (data uncertainty) - Total Uncertainty: 1.5% - Confidence Interval: [1.0%, 4.0%]
Trading Decision: Moderate confidence → Trade with 60% position sizeTypes of Uncertainty
1. Epistemic Uncertainty (Model Uncertainty)
What it captures: Uncertainty due to limited knowledge or training data.
Characteristics:
- Can be reduced with more data
- High in regions far from training distribution
- Indicates model doesn’t “know” this situation well
In trading context:
- Novel market regimes (e.g., first crypto bull run)
- Assets with limited history
- Unusual market conditions
Example:Training data: Normal market (volatility 15-25%)Current market: Extreme volatility (80%)
Epistemic uncertainty: HIGH→ Model hasn't seen this before, be cautious!2. Aleatoric Uncertainty (Data Uncertainty)
What it captures: Irreducible noise inherent in the data.
Characteristics:
- Cannot be reduced with more data
- Represents inherent randomness
- Varies across input space
In trading context:
- High around major news events
- During earnings announcements
- In illiquid markets
Example:Predicting price during scheduled FOMC announcement
Aleatoric uncertainty: HIGH→ Outcome is fundamentally unpredictable, reduce exposure!3. Total Uncertainty
Total Uncertainty = sqrt(Epistemic² + Aleatoric²)
Or more precisely:Var[Y] = E[Var[Y|X]] + Var[E[Y|X]] └─────────┘ └─────────┘ Aleatoric EpistemicEnsemble Methods for Uncertainty
1. Bagging (Bootstrap Aggregating)
Concept: Train multiple models on bootstrap samples of the data.
Original Data: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Bootstrap Sample 1: [1, 1, 3, 4, 4, 6, 7, 8, 8, 10] → Model 1Bootstrap Sample 2: [2, 2, 3, 5, 5, 6, 7, 9, 9, 10] → Model 2Bootstrap Sample 3: [1, 3, 3, 4, 5, 6, 8, 8, 9, 10] → Model 3...Prediction and Uncertainty:
predictions = [model.predict(X) for model in ensemble]mean_prediction = np.mean(predictions)uncertainty = np.std(predictions)Advantages:
- Simple to implement
- Naturally provides uncertainty estimates
- Works with any base model
2. Random Forest Uncertainty
Random Forests extend bagging with feature randomization:
┌──────────────────────────────────────────────────────────────┐│ RANDOM FOREST ENSEMBLE │├──────────────────────────────────────────────────────────────┤│ ││ Tree 1 Tree 2 Tree 3 Tree N ││ (Features: (Features: (Features: (Features: ││ A,C,E) B,D,F) A,B,D) C,E,F) ││ │ │ │ │ ││ v v v v ││ Pred: 1.5% Pred: 2.0% Pred: 1.8% Pred: 2.2% ││ ││ Final: Mean = 1.875%, Std = 0.31% (uncertainty) ││ │└──────────────────────────────────────────────────────────────┘Variance Decomposition:
# Individual tree predictionstree_predictions = [tree.predict(X) for tree in forest.estimators_]
# Mean predictionmean_pred = np.mean(tree_predictions, axis=0)
# Total variance (epistemic uncertainty)total_variance = np.var(tree_predictions, axis=0)
# For regression: leaf variance gives aleatoric estimateleaf_variances = [get_leaf_variance(tree, X) for tree in forest.estimators_]aleatoric_var = np.mean(leaf_variances, axis=0)
# Epistemic varianceepistemic_var = total_variance - aleatoric_var3. Boosting with Uncertainty
Gradient Boosting can also provide uncertainty:
NGBoost (Natural Gradient Boosting):- Outputs distribution parameters, not point predictions- Each tree predicts parameters of a probability distribution
Standard GBM: predict(X) → yNGBoost: predict(X) → (μ, σ) # Mean and standard deviationQuantile Gradient Boosting:
# Train separate models for different quantilesmodel_q10 = GradientBoostingRegressor(loss='quantile', alpha=0.10)model_q50 = GradientBoostingRegressor(loss='quantile', alpha=0.50)model_q90 = GradientBoostingRegressor(loss='quantile', alpha=0.90)
# Predictions give confidence intervallower_bound = model_q10.predict(X) # 10th percentilemedian = model_q50.predict(X) # Medianupper_bound = model_q90.predict(X) # 90th percentile
uncertainty = (upper_bound - lower_bound) / 24. Stacking with Uncertainty
Stacking combines diverse models:
┌────────────────────────────────────────────────────────────────┐│ STACKING ENSEMBLE │├────────────────────────────────────────────────────────────────┤│ ││ Level 0 (Base Models): ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ Random │ │ Gradient │ │ SVM │ │ Neural │ ││ │ Forest │ │ Boosting │ │ │ │ Network │ ││ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ ││ │ │ │ │ ││ └────────────┴─────┬──────┴────────────┘ ││ │ ││ Level 1 (Meta-model): ▼ ││ ┌────────────────────────────────────────────┐ ││ │ Meta-learner (combines predictions) │ ││ │ Also estimates uncertainty from │ ││ │ base model disagreement │ ││ └────────────────────────────────────────────┘ ││ │ ││ ▼ ││ Final Prediction + Uncertainty ││ │└────────────────────────────────────────────────────────────────┘Uncertainty from Stacking:
base_predictions = [model.predict(X) for model in base_models]
# Diversity-based uncertaintyuncertainty = np.std(base_predictions, axis=0)
# Weighted disagreement (if we have model weights)weighted_variance = np.sum(weights * (base_predictions - mean_pred)**2)Out-of-Bag (OOB) Uncertainty
Concept
In bagging, each bootstrap sample leaves out ~37% of data. We can use these “out-of-bag” samples for unbiased uncertainty estimation.
Data point i=5:├── NOT in bootstrap sample 1 (OOB) → Use Tree 1's prediction├── IN bootstrap sample 2 (skip)├── NOT in bootstrap sample 3 (OOB) → Use Tree 3's prediction├── IN bootstrap sample 4 (skip)└── NOT in bootstrap sample 5 (OOB) → Use Tree 5's prediction
OOB Prediction for i=5: mean(Tree1, Tree3, Tree5)OOB Uncertainty for i=5: std(Tree1, Tree3, Tree5)Implementation
def compute_oob_uncertainty(forest, X_train, y_train): n_samples = X_train.shape[0] n_trees = len(forest.estimators_)
predictions = np.zeros((n_trees, n_samples)) in_bag = np.zeros((n_trees, n_samples), dtype=bool)
for i, (tree, samples) in enumerate(zip(forest.estimators_, forest.estimators_samples_)): # Track which samples were in-bag in_bag[i, samples] = True # Get predictions for all samples predictions[i] = tree.predict(X_train)
# For each sample, compute stats using only OOB trees oob_mean = np.zeros(n_samples) oob_std = np.zeros(n_samples)
for i in range(n_samples): oob_mask = ~in_bag[:, i] if oob_mask.sum() > 0: oob_preds = predictions[oob_mask, i] oob_mean[i] = np.mean(oob_preds) oob_std[i] = np.std(oob_preds)
return oob_mean, oob_stdModel Disagreement Metrics
1. Prediction Variance
def prediction_variance(ensemble, X): """Standard deviation of predictions across ensemble members.""" predictions = np.array([m.predict(X) for m in ensemble]) return np.std(predictions, axis=0)2. Coefficient of Variation
def coefficient_of_variation(ensemble, X): """Relative uncertainty: std/mean.""" predictions = np.array([m.predict(X) for m in ensemble]) mean_pred = np.mean(predictions, axis=0) std_pred = np.std(predictions, axis=0) return std_pred / (np.abs(mean_pred) + 1e-8)3. Inter-Quantile Range
def iqr_uncertainty(ensemble, X): """Range between 25th and 75th percentile predictions.""" predictions = np.array([m.predict(X) for m in ensemble]) q25 = np.percentile(predictions, 25, axis=0) q75 = np.percentile(predictions, 75, axis=0) return q75 - q254. Entropy (for Classification)
def prediction_entropy(ensemble, X): """Entropy of averaged probability predictions.""" probs = np.array([m.predict_proba(X) for m in ensemble]) mean_probs = np.mean(probs, axis=0) entropy = -np.sum(mean_probs * np.log(mean_probs + 1e-8), axis=1) return entropy5. Mutual Information (Epistemic Uncertainty)
def mutual_information(ensemble, X): """Captures epistemic uncertainty for classification.""" probs = np.array([m.predict_proba(X) for m in ensemble]) mean_probs = np.mean(probs, axis=0)
# Total entropy total_entropy = -np.sum(mean_probs * np.log(mean_probs + 1e-8), axis=1)
# Expected entropy (average entropy of individual predictions) individual_entropies = -np.sum(probs * np.log(probs + 1e-8), axis=2) expected_entropy = np.mean(individual_entropies, axis=0)
# Mutual information = Total - Expected return total_entropy - expected_entropyCalibration
Why Calibration Matters
A model is calibrated if its predicted probabilities match actual frequencies:
If model predicts 70% up probability for 100 situations:- Well-calibrated: ~70 actually went up- Overconfident: 50 actually went up- Underconfident: 90 actually went upCalibration Metrics
Expected Calibration Error (ECE):
def expected_calibration_error(y_true, y_prob, n_bins=10): """Compute Expected Calibration Error.""" bin_boundaries = np.linspace(0, 1, n_bins + 1) ece = 0.0
for i in range(n_bins): mask = (y_prob >= bin_boundaries[i]) & (y_prob < bin_boundaries[i+1]) if mask.sum() > 0: bin_accuracy = y_true[mask].mean() bin_confidence = y_prob[mask].mean() bin_size = mask.sum() / len(y_true) ece += bin_size * np.abs(bin_accuracy - bin_confidence)
return eceCalibration Methods
Platt Scaling:
from sklearn.calibration import CalibratedClassifierCV
calibrated_model = CalibratedClassifierCV(base_model, method='sigmoid', cv=5)calibrated_model.fit(X_train, y_train)calibrated_probs = calibrated_model.predict_proba(X_test)Isotonic Regression:
calibrated_model = CalibratedClassifierCV(base_model, method='isotonic', cv=5)Temperature Scaling (for neural networks):
def temperature_scaling(logits, temperature): """Scale logits by temperature for calibration.""" return logits / temperature
# Optimize temperature on validation setoptimal_temp = optimize_temperature(val_logits, val_labels)calibrated_logits = temperature_scaling(test_logits, optimal_temp)Trading Strategy with Uncertainty
Position Sizing Based on Uncertainty
def calculate_position_size(prediction, uncertainty, base_size, max_uncertainty=0.05, min_confidence=0.6): """ Scale position size inversely with uncertainty.
Args: prediction: Expected return or probability uncertainty: Model uncertainty estimate base_size: Base position size (e.g., 1.0 for full position) max_uncertainty: Maximum acceptable uncertainty min_confidence: Minimum confidence to trade
Returns: Scaled position size """ # Skip if uncertainty too high if uncertainty > max_uncertainty: return 0.0
# Confidence score (inverse of normalized uncertainty) confidence = 1.0 - (uncertainty / max_uncertainty)
# Skip if confidence too low if confidence < min_confidence: return 0.0
# Scale position by confidence position_size = base_size * confidence
# Further scale by prediction strength prediction_strength = min(abs(prediction) / 0.02, 1.0) # Normalize by 2% position_size *= prediction_strength
return position_sizeKelly Criterion with Uncertainty
def kelly_with_uncertainty(win_prob, win_size, loss_size, uncertainty): """ Adjusted Kelly criterion accounting for uncertainty.
Standard Kelly: f* = (p*b - q) / b where p = win probability, q = 1-p, b = win/loss ratio
Adjusted: reduce position when uncertainty is high """ # Standard Kelly b = win_size / loss_size q = 1 - win_prob kelly = (win_prob * b - q) / b
# Uncertainty adjustment # Higher uncertainty → more conservative uncertainty_factor = 1.0 / (1.0 + uncertainty * 10)
# Also use fractional Kelly (half-Kelly is common) fractional_kelly = kelly * 0.5 * uncertainty_factor
return max(0, fractional_kelly)Ensemble Voting with Uncertainty Weighting
def uncertainty_weighted_vote(ensemble, X, method='inverse_variance'): """ Combine ensemble predictions with uncertainty weighting. """ predictions = np.array([m.predict(X) for m in ensemble])
if method == 'inverse_variance': # Weight models by inverse of their historical variance variances = np.array([m.historical_variance for m in ensemble]) weights = 1.0 / (variances + 1e-8) weights /= weights.sum()
elif method == 'accuracy': # Weight by historical accuracy accuracies = np.array([m.historical_accuracy for m in ensemble]) weights = accuracies / accuracies.sum()
weighted_prediction = np.sum(weights[:, np.newaxis] * predictions, axis=0) weighted_uncertainty = np.sqrt(np.sum(weights[:, np.newaxis]**2 * (predictions - weighted_prediction)**2, axis=0))
return weighted_prediction, weighted_uncertaintyTrading Decisions Framework
┌────────────────────────────────────────────────────────────────────┐│ UNCERTAINTY-AWARE TRADING DECISION FRAMEWORK │├────────────────────────────────────────────────────────────────────┤│ ││ Step 1: Get Ensemble Prediction ││ ┌─────────────────────────────────────────────────────────────┐ ││ │ predictions = ensemble.predict(market_data) │ ││ │ mean_pred = np.mean(predictions) │ ││ │ uncertainty = np.std(predictions) │ ││ └─────────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ Step 2: Classify Uncertainty Level ││ ┌─────────────────────────────────────────────────────────────┐ ││ │ if uncertainty < 0.01: confidence = "HIGH" │ ││ │ elif uncertainty < 0.03: confidence = "MEDIUM" │ ││ │ else: confidence = "LOW" │ ││ └─────────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ Step 3: Determine Action ││ ┌─────────────────────────────────────────────────────────────┐ ││ │ │ ││ │ HIGH confidence + Strong signal: │ ││ │ → Full position (100%) │ ││ │ │ ││ │ MEDIUM confidence + Strong signal: │ ││ │ → Partial position (50-70%) │ ││ │ │ ││ │ LOW confidence (any signal): │ ││ │ → No trade or minimal position (0-20%) │ ││ │ │ ││ │ Any confidence + Weak signal: │ ││ │ → No trade │ ││ │ │ ││ └─────────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ Step 4: Risk Management ││ ┌─────────────────────────────────────────────────────────────┐ ││ │ stop_loss = mean_pred - 2 * uncertainty │ ││ │ take_profit = mean_pred + 1.5 * uncertainty │ ││ │ (Wider stops when uncertainty is high) │ ││ └─────────────────────────────────────────────────────────────┘ ││ │└────────────────────────────────────────────────────────────────────┘Implementation Architecture
┌─────────────────────────────────────────────────────────────────────┐│ ENSEMBLE UNCERTAINTY SYSTEM │├─────────────────────────────────────────────────────────────────────┤│ ││ DATA LAYER ││ ┌────────────────────────────────────────────────────────────────┐ ││ │ Bybit API (via CCXT) │ ││ │ ├── Real-time OHLCV data │ ││ │ ├── Order book snapshots │ ││ │ └── Trade history │ ││ └────────────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ FEATURE ENGINEERING ││ ┌────────────────────────────────────────────────────────────────┐ ││ │ ├── Technical indicators (RSI, MACD, Bollinger) │ ││ │ ├── Volume features │ ││ │ ├── Volatility measures │ ││ │ └── Market microstructure features │ ││ └────────────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ ENSEMBLE LAYER ││ ┌────────────────────────────────────────────────────────────────┐ ││ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ ││ │ │Random Forest │ │ XGBoost │ │ LightGBM │ │ ││ │ │ (Bagging) │ │ (Boosting) │ │ (Boosting) │ │ ││ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ ││ │ │ │ │ │ ││ │ └────────────────┼────────────────┘ │ ││ │ │ │ ││ │ ▼ │ ││ │ ┌────────────────────┐ │ ││ │ │ Meta-Learner │ │ ││ │ │ (Stacking Layer) │ │ ││ │ └────────────────────┘ │ ││ └────────────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ UNCERTAINTY QUANTIFICATION ││ ┌────────────────────────────────────────────────────────────────┐ ││ │ ├── Prediction variance (model disagreement) │ ││ │ ├── OOB error estimates │ ││ │ ├── Quantile predictions (confidence intervals) │ ││ │ └── Calibration adjustment │ ││ └────────────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ TRADING DECISIONS ││ ┌────────────────────────────────────────────────────────────────┐ ││ │ ├── Signal generation (with confidence) │ ││ │ ├── Position sizing (uncertainty-scaled) │ ││ │ ├── Risk management (dynamic stops) │ ││ │ └── Order execution │ ││ └────────────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────┘Key Metrics
Uncertainty Quality Metrics
| Metric | Description | Target |
|---|---|---|
| Calibration Error | How well predicted uncertainty matches actual error | < 0.05 |
| Sharpness | Average size of prediction intervals | Minimize |
| Coverage | % of true values within prediction interval | 90-95% |
| CRPS | Continuous Ranked Probability Score | Minimize |
Trading Performance with Uncertainty
| Metric | Without Uncertainty | With Uncertainty |
|---|---|---|
| Sharpe Ratio | 1.2 | 1.8 |
| Max Drawdown | -25% | -15% |
| Win Rate | 52% | 58% |
| Avg Position Size | 100% | 65% |
| Risk-Adjusted Return | 1.0x | 1.5x |
Best Practices
1. Ensemble Diversity
# Good: Diverse models with different inductive biasesensemble = [ RandomForestRegressor(n_estimators=100), GradientBoostingRegressor(n_estimators=100), XGBRegressor(n_estimators=100), LGBMRegressor(n_estimators=100), SVR(kernel='rbf'),]
# Bad: Homogeneous ensembleensemble = [ RandomForestRegressor(n_estimators=100), RandomForestRegressor(n_estimators=100, max_depth=5), RandomForestRegressor(n_estimators=100, max_depth=10),]2. Regular Calibration Checks
def monitor_calibration(predictions, uncertainties, actuals, window=100): """Monitor calibration in rolling window.""" for i in range(len(actuals) - window): window_preds = predictions[i:i+window] window_uncert = uncertainties[i:i+window] window_actual = actuals[i:i+window]
# Check if actual values fall within predicted intervals z_scores = (window_actual - window_preds) / window_uncert coverage = np.mean(np.abs(z_scores) < 1.96)
if coverage < 0.90: logger.warning(f"Calibration degraded: {coverage:.2%} coverage")3. Uncertainty Thresholds
UNCERTAINTY_THRESHOLDS = { 'very_low': 0.005, # < 0.5% → Very confident 'low': 0.01, # 0.5-1% → Confident 'medium': 0.02, # 1-2% → Moderate 'high': 0.03, # 2-3% → Low confidence 'very_high': 0.05, # > 3% → Do not trade}Directory Structure
324_ensemble_uncertainty/├── README.md # This file├── README.ru.md # Russian translation├── readme.simple.md # Beginner-friendly explanation├── readme.simple.ru.md # Russian beginner version├── python/ # Python implementation│ ├── __init__.py│ ├── data_fetcher.py # CCXT-based data fetching│ ├── features.py # Feature engineering│ ├── ensemble.py # Ensemble models│ ├── uncertainty.py # Uncertainty quantification│ ├── calibration.py # Model calibration│ ├── strategy.py # Trading strategy│ ├── backtest.py # Backtesting engine│ └── main.py # Main entry point└── rust_ensemble_uncertainty/ # Rust implementation ├── Cargo.toml ├── src/ │ ├── lib.rs │ ├── api/ # Bybit API client │ ├── ensemble/ # Ensemble implementations │ ├── uncertainty/ # Uncertainty metrics │ ├── strategy/ # Trading strategy │ └── backtest/ # Backtesting └── examples/ ├── fetch_data.rs ├── train_ensemble.rs └── live_trading.rsReferences
-
Lakshminarayanan et al. (2017) - “Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles”
-
Gal & Ghahramani (2016) - “Dropout as a Bayesian Approximation”
-
Kuleshov et al. (2018) - “Accurate Uncertainties for Deep Learning Using Calibrated Regression”
-
Duan et al. (2020) - “NGBoost: Natural Gradient Boosting for Probabilistic Prediction”
-
Breiman (2001) - “Random Forests”
Difficulty Level
Intermediate to Advanced - Requires understanding of:
- Ensemble methods (bagging, boosting, stacking)
- Probability theory and statistics
- Model calibration concepts
- Risk management principles
- Python/Rust programming
Disclaimer
This chapter is for educational purposes only. Cryptocurrency trading involves substantial risk. The strategies described here have not been validated in live trading and should be thoroughly tested before any real-world application. Past performance does not guarantee future results. Always use proper risk management and never trade with funds you cannot afford to lose.