Skip to content

Chapter 356: DenseNet for Cryptocurrency Trading

Chapter 356: DenseNet for Cryptocurrency Trading

Overview

DenseNet (Densely Connected Convolutional Networks) represents a breakthrough architecture where each layer is connected to every other layer in a feed-forward fashion. For trading applications, this dense connectivity enables exceptional feature reuse, improved gradient flow, and remarkably efficient parameter usage — critical advantages when extracting complex patterns from cryptocurrency market data.

Trading Strategy

Core Concept: Apply DenseNet’s dense connectivity pattern to extract hierarchical features from cryptocurrency OHLCV data and order book dynamics. The architecture treats time-series data as structured inputs where early-layer features (short-term patterns) directly contribute to later-layer predictions (long-term trends).

Key Advantages for Trading:

  1. Feature Reuse — Early detected patterns (support/resistance, candlestick formations) remain accessible to all subsequent layers
  2. Gradient Flow — Deep networks train effectively without vanishing gradients
  3. Parameter Efficiency — Fewer parameters than ResNet while maintaining performance
  4. Implicit Ensemble — Dense connections create an implicit deep supervision effect

Edge: DenseNet’s architecture naturally captures multi-scale temporal dependencies in market data, from tick-level microstructure to daily trends.

Technical Specification

Architecture Components

┌─────────────────────────────────────────────────────────────────┐
│ DenseNet Trading Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Input: OHLCV + Technical Indicators + Order Book Features │
│ [batch_size, sequence_length, features] │
│ │ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Initial Convolution │ │
│ │ (7x1, stride 2) │ │
│ └───────────┬─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Dense Block 1 │ ◄── Growth Rate k=32 │
│ │ (6 dense layers) │ │
│ └───────────┬─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Transition Layer 1 │ ◄── Compression θ=0.5 │
│ │ (1x1 conv + AvgPool) │ │
│ └───────────┬─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Dense Block 2 │ │
│ │ (12 dense layers) │ │
│ └───────────┬─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Transition Layer 2 │ │
│ └───────────┬─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Dense Block 3 │ │
│ │ (24 dense layers) │ │
│ └───────────┬─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Global Average Pool │ │
│ └───────────┬─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Trading Head │ │
│ │ (Position: -1, 0, +1) │ │
│ └─────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Dense Block Mechanism

class DenseLayer:
"""
Single layer within a Dense Block
H_l(x) = BN → ReLU → Conv(3x1) → Dropout
Each layer receives ALL previous feature maps as input
"""
def __init__(self, in_channels, growth_rate, dropout=0.2):
self.bn = BatchNorm1d(in_channels)
self.conv = Conv1d(in_channels, growth_rate, kernel_size=3, padding=1)
self.dropout = Dropout(dropout)
def forward(self, x):
# x is concatenation of all previous outputs
out = self.dropout(self.conv(relu(self.bn(x))))
return out
class DenseBlock:
"""
Dense Block: each layer connected to all previous layers
Layer 0: x_0 → H_0(x_0) = y_0
Layer 1: [x_0, y_0] → H_1([x_0, y_0]) = y_1
Layer 2: [x_0, y_0, y_1] → H_2([x_0, y_0, y_1]) = y_2
...
"""
def __init__(self, n_layers, in_channels, growth_rate):
self.layers = []
for i in range(n_layers):
layer_in = in_channels + i * growth_rate
self.layers.append(DenseLayer(layer_in, growth_rate))
def forward(self, x):
features = [x]
for layer in self.layers:
# Concatenate ALL previous feature maps
concat_features = torch.cat(features, dim=1)
new_features = layer(concat_features)
features.append(new_features)
return torch.cat(features, dim=1)

Transition Layer

class TransitionLayer:
"""
Reduces feature maps between Dense Blocks
Compression factor θ reduces channels: out = floor(θ * in_channels)
"""
def __init__(self, in_channels, compression=0.5):
out_channels = int(in_channels * compression)
self.bn = BatchNorm1d(in_channels)
self.conv = Conv1d(in_channels, out_channels, kernel_size=1)
self.pool = AvgPool1d(kernel_size=2, stride=2)
def forward(self, x):
return self.pool(self.conv(relu(self.bn(x))))

Trading-Specific DenseNet

class DenseNetTrader:
"""
DenseNet architecture adapted for cryptocurrency trading
"""
def __init__(
self,
input_features: int = 32, # OHLCV + indicators + orderbook
sequence_length: int = 128, # Lookback window
growth_rate: int = 32, # k: new features per layer
block_config: tuple = (6, 12, 24), # Layers per block
compression: float = 0.5, # θ: channel reduction
num_classes: int = 3, # Long, Hold, Short
dropout: float = 0.2
):
# Initial convolution
self.init_conv = Conv1d(input_features, 64, kernel_size=7, stride=2, padding=3)
self.init_bn = BatchNorm1d(64)
self.init_pool = MaxPool1d(kernel_size=3, stride=2, padding=1)
# Dense Blocks and Transitions
self.blocks = []
num_features = 64
for i, num_layers in enumerate(block_config):
block = DenseBlock(num_layers, num_features, growth_rate)
self.blocks.append(block)
num_features = num_features + num_layers * growth_rate
if i != len(block_config) - 1:
trans = TransitionLayer(num_features, compression)
self.blocks.append(trans)
num_features = int(num_features * compression)
# Final layers
self.final_bn = BatchNorm1d(num_features)
self.global_pool = AdaptiveAvgPool1d(1)
# Trading head
self.classifier = Sequential(
Linear(num_features, 128),
ReLU(),
Dropout(dropout),
Linear(128, num_classes)
)
# Risk management head
self.risk_head = Sequential(
Linear(num_features, 64),
ReLU(),
Linear(64, 2) # Volatility prediction, Position size
)
def forward(self, x):
# x: [batch, features, sequence]
# Initial processing
x = self.init_pool(relu(self.init_bn(self.init_conv(x))))
# Dense blocks
for block in self.blocks:
x = block(x)
# Final processing
x = relu(self.final_bn(x))
x = self.global_pool(x).squeeze(-1)
# Trading outputs
position = self.classifier(x)
risk = self.risk_head(x)
return {
'position': softmax(position, dim=-1),
'volatility': relu(risk[:, 0]),
'position_size': sigmoid(risk[:, 1])
}

Feature Engineering for Crypto

class CryptoFeatureExtractor:
"""
Extract features from Bybit cryptocurrency data
"""
def __init__(self, sequence_length=128):
self.seq_len = sequence_length
def extract(self, ohlcv_data, orderbook_data):
features = {}
# Price features
features['returns'] = np.log(ohlcv_data['close'] / ohlcv_data['close'].shift(1))
features['high_low_range'] = (ohlcv_data['high'] - ohlcv_data['low']) / ohlcv_data['close']
features['close_position'] = (ohlcv_data['close'] - ohlcv_data['low']) / (ohlcv_data['high'] - ohlcv_data['low'])
# Volume features
features['volume_ma_ratio'] = ohlcv_data['volume'] / ohlcv_data['volume'].rolling(20).mean()
features['volume_std'] = ohlcv_data['volume'].rolling(20).std() / ohlcv_data['volume'].rolling(20).mean()
# Volatility features
features['realized_vol'] = features['returns'].rolling(20).std() * np.sqrt(365 * 24)
features['parkinson_vol'] = self._parkinson_volatility(ohlcv_data)
# Momentum indicators
features['rsi'] = self._compute_rsi(ohlcv_data['close'], 14)
features['macd'], features['macd_signal'] = self._compute_macd(ohlcv_data['close'])
# Order book features
features['bid_ask_spread'] = (orderbook_data['best_ask'] - orderbook_data['best_bid']) / orderbook_data['mid_price']
features['order_imbalance'] = (orderbook_data['bid_volume'] - orderbook_data['ask_volume']) / (orderbook_data['bid_volume'] + orderbook_data['ask_volume'])
features['depth_imbalance'] = self._compute_depth_imbalance(orderbook_data)
# Funding rate (perpetual futures)
features['funding_rate'] = orderbook_data.get('funding_rate', 0)
# Open interest dynamics
features['oi_change'] = orderbook_data['open_interest'].pct_change()
return self._normalize_and_stack(features)
def _parkinson_volatility(self, ohlcv, window=20):
"""High-low volatility estimator"""
log_hl = np.log(ohlcv['high'] / ohlcv['low'])
return np.sqrt((1 / (4 * np.log(2))) * (log_hl ** 2).rolling(window).mean())
def _compute_rsi(self, prices, period=14):
delta = prices.diff()
gain = (delta.where(delta > 0, 0)).rolling(window=period).mean()
loss = (-delta.where(delta < 0, 0)).rolling(window=period).mean()
rs = gain / loss
return 100 - (100 / (1 + rs))
def _compute_macd(self, prices, fast=12, slow=26, signal=9):
ema_fast = prices.ewm(span=fast).mean()
ema_slow = prices.ewm(span=slow).mean()
macd = ema_fast - ema_slow
macd_signal = macd.ewm(span=signal).mean()
return macd, macd_signal
def _compute_depth_imbalance(self, orderbook, levels=5):
"""Compute order book depth imbalance across multiple levels"""
bid_depth = sum(orderbook[f'bid_vol_{i}'] for i in range(levels))
ask_depth = sum(orderbook[f'ask_vol_{i}'] for i in range(levels))
return (bid_depth - ask_depth) / (bid_depth + ask_depth + 1e-8)

Multi-Scale DenseNet

class MultiScaleDenseNet:
"""
DenseNet with multiple time-scale inputs for comprehensive market analysis
"""
def __init__(self, scales=[1, 5, 15, 60]): # 1m, 5m, 15m, 1h
self.scales = scales
self.encoders = {
scale: DenseNetEncoder(growth_rate=24)
for scale in scales
}
# Fusion layer
total_features = len(scales) * 256
self.fusion = Sequential(
Linear(total_features, 512),
BatchNorm1d(512),
ReLU(),
Dropout(0.3),
Linear(512, 256)
)
# Multi-task heads
self.direction_head = Linear(256, 3) # Long, Hold, Short
self.magnitude_head = Linear(256, 1) # Expected return
self.confidence_head = Linear(256, 1) # Prediction confidence
def forward(self, multi_scale_data):
"""
multi_scale_data: dict of {scale: tensor}
"""
# Encode each time scale
encoded = []
for scale in self.scales:
x = multi_scale_data[scale]
enc = self.encoders[scale](x)
encoded.append(enc)
# Concatenate and fuse
combined = torch.cat(encoded, dim=-1)
fused = self.fusion(combined)
return {
'direction': softmax(self.direction_head(fused), dim=-1),
'magnitude': self.magnitude_head(fused),
'confidence': sigmoid(self.confidence_head(fused))
}

Training Pipeline

class DenseNetTradingTrainer:
"""
Training pipeline with proper loss functions for trading
"""
def __init__(self, model, learning_rate=1e-4):
self.model = model
self.optimizer = Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)
self.scheduler = CosineAnnealingWarmRestarts(self.optimizer, T_0=10)
def train_epoch(self, train_loader):
self.model.train()
total_loss = 0
for batch in train_loader:
features, labels = batch
self.optimizer.zero_grad()
outputs = self.model(features)
loss = self._compute_loss(outputs, labels)
loss.backward()
clip_grad_norm_(self.model.parameters(), max_norm=1.0)
self.optimizer.step()
total_loss += loss.item()
self.scheduler.step()
return total_loss / len(train_loader)
def _compute_loss(self, outputs, labels):
"""
Multi-task loss combining:
1. Direction classification (cross-entropy)
2. Return prediction (MSE weighted by confidence)
3. Sharpe ratio maximization (custom)
"""
# Direction loss
direction_loss = cross_entropy(
outputs['position'],
labels['direction'],
weight=torch.tensor([1.0, 0.5, 1.0]) # Weight action classes
)
# Return prediction loss (only for non-hold positions)
mask = labels['direction'] != 1
if mask.any():
pred_returns = outputs['magnitude'][mask]
true_returns = labels['future_return'][mask]
return_loss = mse_loss(pred_returns, true_returns)
else:
return_loss = 0
# Sharpe-inspired loss
sharpe_loss = self._sharpe_loss(outputs, labels)
# Combine losses
total_loss = direction_loss + 0.5 * return_loss + 0.3 * sharpe_loss
return total_loss
def _sharpe_loss(self, outputs, labels, eps=1e-8):
"""
Differentiable Sharpe ratio loss
"""
# Predicted positions (-1, 0, 1)
positions = outputs['position'][:, 2] - outputs['position'][:, 0]
# Strategy returns
strategy_returns = positions * labels['future_return']
# Negative Sharpe (for minimization)
mean_return = strategy_returns.mean()
std_return = strategy_returns.std() + eps
return -mean_return / std_return

Backtesting Integration

class DenseNetBacktester:
"""
Backtesting framework for DenseNet trading signals
"""
def __init__(self, model, fee_rate=0.0004): # Bybit taker fee
self.model = model
self.fee_rate = fee_rate
def run_backtest(self, data, initial_capital=10000):
self.model.eval()
results = {
'timestamps': [],
'positions': [],
'equity': [],
'returns': [],
'trades': []
}
capital = initial_capital
position = 0
entry_price = 0
for i in range(len(data) - 1):
features = self._prepare_features(data, i)
with torch.no_grad():
output = self.model(features.unsqueeze(0))
signal = self._get_signal(output)
current_price = data['close'].iloc[i]
next_price = data['close'].iloc[i + 1]
# Execute trade if signal changed
if signal != position:
# Close existing position
if position != 0:
pnl = position * (current_price - entry_price) / entry_price
fee = abs(position) * self.fee_rate * 2
capital *= (1 + pnl - fee)
results['trades'].append({
'exit_time': data.index[i],
'exit_price': current_price,
'pnl': pnl - fee
})
# Open new position
if signal != 0:
entry_price = current_price
results['trades'].append({
'entry_time': data.index[i],
'entry_price': current_price,
'direction': signal
})
position = signal
# Mark-to-market
if position != 0:
unrealized_pnl = position * (current_price - entry_price) / entry_price
equity = capital * (1 + unrealized_pnl)
else:
equity = capital
results['timestamps'].append(data.index[i])
results['positions'].append(position)
results['equity'].append(equity)
return self._compute_metrics(results)
def _get_signal(self, output, threshold=0.4):
"""Convert model output to trading signal"""
probs = output['position'][0]
confidence = output['confidence'][0].item()
if confidence < 0.3:
return 0 # Hold if uncertain
if probs[2] > threshold:
return 1 # Long
elif probs[0] > threshold:
return -1 # Short
return 0 # Hold
def _compute_metrics(self, results):
"""Compute trading performance metrics"""
equity = np.array(results['equity'])
returns = np.diff(equity) / equity[:-1]
metrics = {
'total_return': (equity[-1] / equity[0]) - 1,
'sharpe_ratio': np.mean(returns) / (np.std(returns) + 1e-8) * np.sqrt(365 * 24),
'max_drawdown': self._max_drawdown(equity),
'win_rate': self._win_rate(results['trades']),
'profit_factor': self._profit_factor(results['trades']),
'num_trades': len([t for t in results['trades'] if 'pnl' in t])
}
return metrics
def _max_drawdown(self, equity):
peak = np.maximum.accumulate(equity)
drawdown = (peak - equity) / peak
return np.max(drawdown)
def _win_rate(self, trades):
closed_trades = [t for t in trades if 'pnl' in t]
if not closed_trades:
return 0
wins = sum(1 for t in closed_trades if t['pnl'] > 0)
return wins / len(closed_trades)
def _profit_factor(self, trades):
closed_trades = [t for t in trades if 'pnl' in t]
profits = sum(t['pnl'] for t in closed_trades if t['pnl'] > 0)
losses = abs(sum(t['pnl'] for t in closed_trades if t['pnl'] < 0))
return profits / (losses + 1e-8)

Key Metrics

  • Model Performance: Accuracy, F1-score, AUC-ROC for direction prediction
  • Trading Performance: Sharpe Ratio, Sortino Ratio, Maximum Drawdown
  • Efficiency Metrics: Parameter count, Inference time, Memory usage
  • Risk Metrics: VaR, Expected Shortfall, Win Rate

DenseNet Variants for Trading

VariantBlocksGrowth RateParametersUse Case
DenseNet-Tiny(4, 4, 4)16~0.5MReal-time HFT
DenseNet-Small(6, 12, 8)24~1.5MIntraday trading
DenseNet-Medium(6, 12, 24)32~4MSwing trading
DenseNet-Large(6, 12, 32, 24)48~12MResearch

Dependencies

[dependencies]
# Core ML
torch = ">=2.0.0"
numpy = ">=1.23.0"
# Data handling
pandas = ">=2.0.0"
polars = ">=0.19.0"
# Bybit API
pybit = ">=5.6.0"
# Technical analysis
ta-lib = ">=0.4.26"
pandas-ta = ">=0.3.14"
# Visualization
matplotlib = ">=3.6.0"
plotly = ">=5.18.0"

Expected Outcomes

  1. DenseNet Trading Model — Optimized architecture for crypto price prediction
  2. Multi-scale Feature Extraction — Comprehensive market representation
  3. Real-time Trading Pipeline — Integration with Bybit exchange
  4. Backtesting Framework — Rigorous performance evaluation
  5. Risk Management Module — Position sizing and stop-loss automation

References

Difficulty Level

⭐⭐⭐⭐ (Advanced)

Prerequisites: Deep Learning fundamentals, CNN architectures, Time series analysis, Cryptocurrency market microstructure