Chapter 84: Matching Networks for Finance
Chapter 84: Matching Networks for Finance
Overview
Matching Networks are a meta-learning approach for one-shot and few-shot learning that uses an attention-based classifier over learned embeddings. Unlike Prototypical Networks that compute class centroids, Matching Networks use a soft nearest-neighbor approach with attention weights to classify new examples. This makes them particularly powerful for financial applications where market patterns may not form simple clusters.
Table of Contents
- Introduction
- Theoretical Foundation
- Architecture Components
- Full Context Embeddings (FCE)
- Application to Financial Markets
- Few-Shot Market Pattern Recognition
- Implementation Strategy
- Bybit Integration
- Risk Management
- Performance Metrics
- Comparison with Prototypical Networks
- References
Introduction
Traditional machine learning approaches for trading require large amounts of labeled data for each market pattern or regime. However, financial markets present unique challenges:
- Pattern scarcity: Some trading patterns (cup-and-handle, head-and-shoulders) occur rarely
- Context sensitivity: The same pattern may have different meanings in different market contexts
- Rapid adaptation: Need to recognize new patterns with few examples
Why Matching Networks for Trading?
┌─────────────────────────────────────────────────────────────────────────┐│ The Attention-Based Trading Problem │├─────────────────────────────────────────────────────────────────────────┤│ ││ Prototypical Networks: Matching Networks: ││ ───────────────────── ───────────────── ││ Compare to class centroids Compare to ALL support examples ││ Simple averaging Attention-weighted voting ││ ││ Problem: Patterns may not Solution: Learn similarity to ││ cluster nicely around a individual examples, using ││ single centroid context-aware embeddings ││ ││ ┌────────────────────────────────────────────────────────────┐ ││ │ │ ││ │ Prototypical: [●] ← centroid Matching: │ ││ │ ▲ Query → [?] │ ││ │ / | \ ↓ │ ││ │ ○ ○ ○ [a₁○ a₂○ a₃○ ...]│ ││ │ (support) attention weights │ ││ │ │ ││ └────────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────┘Key Advantages
| Aspect | Prototypical Networks | Matching Networks |
|---|---|---|
| Classification method | Distance to centroids | Attention over all examples |
| Context awareness | Limited | Full via FCE |
| Embedding | Fixed | Context-dependent |
| Pattern flexibility | Assumes cluster structure | Handles complex distributions |
| Computational cost | Lower | Higher (attention computation) |
| Interpretability | High (prototype distances) | High (attention weights) |
Theoretical Foundation
The Matching Networks Framework
Matching Networks learn a neural network that maps a small labeled support set S and an unlabeled example x̂ to its label ŷ, without the need for fine-tuning:
$$P(ŷ | x̂, S) = \sum_{i=1}^{k} a(x̂, x_i) y_i$$
where $a(x̂, x_i)$ is an attention mechanism that computes how similar the query $x̂$ is to each support example $x_i$.
Mathematical Formulation
Attention Kernel: The attention is computed as a softmax over cosine similarities:
$$a(x̂, x_i) = \frac{\exp(c(f(x̂), g(x_i)))}{\sum_{j=1}^{k} \exp(c(f(x̂), g(x_j)))}$$
where:
- $f$ is the embedding function for the query
- $g$ is the embedding function for support examples
- $c$ is the cosine similarity: $c(a, b) = \frac{a \cdot b}{|a| |b|}$
Key Components
┌────────────────────────────────────────────────────────────────────────┐│ Matching Networks Architecture │├────────────────────────────────────────────────────────────────────────┤│ ││ SUPPORT SET S = {(x₁,y₁), (x₂,y₂), ..., (xₖ,yₖ)} ││ ↓ ││ ┌────────────────────────────────────────┐ ││ │ g(xᵢ, S) - Support Embedding │ ││ │ ────────────────────────────── │ ││ │ Uses bidirectional LSTM to encode │ ││ │ each support example with context │ ││ │ from the entire support set │ ││ └────────────────────────────────────────┘ ││ ││ QUERY x̂ ││ ↓ ││ ┌────────────────────────────────────────┐ ││ │ f(x̂, S) - Query Embedding (FCE) │ ││ │ ────────────────────────────── │ ││ │ LSTM with attention that reads over │ ││ │ the support set embeddings │ ││ └────────────────────────────────────────┘ ││ ││ CLASSIFICATION ││ ↓ ││ ┌────────────────────────────────────────┐ ││ │ Attention-weighted voting: │ ││ │ P(y|x̂,S) = Σᵢ a(x̂,xᵢ)yᵢ │ ││ │ │ ││ │ where a = softmax(cosine_similarity) │ ││ └────────────────────────────────────────┘ ││ │└────────────────────────────────────────────────────────────────────────┘Episodic Training
Training mirrors the test-time task structure:
┌────────────────────────────────────────────────────────────────────────┐│ Episodic Training Process │├────────────────────────────────────────────────────────────────────────┤│ ││ Each episode simulates a few-shot classification task: ││ ││ Step 1: Sample N classes (e.g., 5 market patterns) ││ ┌─────────────────────────────────────────────────────────────┐ ││ │ Double Top │ Double Bottom │ Breakout │ Reversal │ Trend │ ││ └─────────────────────────────────────────────────────────────┘ ││ ││ Step 2: For each class, sample K support + Q query examples ││ ┌─────────────────────────────────────────────────────────────┐ ││ │ Double Top: [s1, s2, s3, s4, s5] | [q1, q2, q3] │ ││ │ Double Bottom: [s1, s2, s3, s4, s5] | [q1, q2, q3] │ ││ │ ... │ ││ └─────────────────────────────────────────────────────────────┘ ││ Support Set (5-shot) Query Set ││ ││ Step 3: Embed support examples using g(·, S) ││ Step 4: Embed query examples using f(·, S) ││ Step 5: Compute attention weights and predictions ││ Step 6: Compute cross-entropy loss and backpropagate ││ ││ Key Insight: Train and test conditions must match! ││ │└────────────────────────────────────────────────────────────────────────┘Architecture Components
Embedding Functions
Basic Embedding (without FCE):
When using simple embeddings without Full Context Embeddings, both f and g are the same neural network:
# Simple embedding: f = g = neural_networkembedding = neural_network(x) # Same for query and supportFull Context Embeddings (FCE):
FCE uses context-dependent embeddings that condition on the entire support set:
┌────────────────────────────────────────────────────────────────────────┐│ Full Context Embeddings (FCE) │├────────────────────────────────────────────────────────────────────────┤│ ││ SUPPORT EMBEDDING g(xᵢ, S): ││ ───────────────────────────── ││ Step 1: Compute basic embeddings g'(xⱼ) for all xⱼ ∈ S ││ Step 2: Run bidirectional LSTM over all g'(xⱼ) ││ Step 3: g(xᵢ, S) = h→ᵢ + h←ᵢ + g'(xᵢ) (forward + backward + skip) ││ ││ ┌─────────────────────────────────────────────────────────┐ ││ │ g'(x₁) → [LSTM→] → h→₁ │ ││ │ g'(x₂) → [LSTM→] → h→₂ │ ││ │ g'(x₃) → [LSTM→] → h→₃ │ ││ │ ← [LSTM←] │ ││ │ g(xᵢ,S) = h→ᵢ + h←ᵢ + g'(xᵢ) │ ││ └─────────────────────────────────────────────────────────┘ ││ ││ QUERY EMBEDDING f(x̂, S): ││ ───────────────────────── ││ Uses LSTM with attention over support embeddings ││ ││ for k = 1 to K processing steps: ││ h, c = LSTM(f'(x̂), [h, r], c) ││ h = h + f'(x̂) # Skip connection ││ r = attention_readout(h, g(S)) # Attend over support ││ f(x̂, S) = h ││ ││ ┌─────────────────────────────────────────────────────────┐ ││ │ f'(x̂) → [LSTM] → h₁ → attention → r₁ │ ││ │ → [LSTM] → h₂ → attention → r₂ │ ││ │ → ... │ ││ │ → [LSTM] → hₖ = f(x̂, S) │ ││ └─────────────────────────────────────────────────────────┘ ││ │└────────────────────────────────────────────────────────────────────────┘Attention Readout Mechanism
The query embedding uses attention to gather information from support examples:
def attention_readout(h, g_support): """ Compute attention-weighted readout from support embeddings.
Args: h: Current query hidden state g_support: Support embeddings (n_support, embed_dim)
Returns: r: Readout vector (weighted sum of support embeddings) """ # Compute attention weights scores = h @ g_support.T # (embed_dim,) @ (n_support, embed_dim).T attention = softmax(scores) # (n_support,)
# Compute weighted sum r = attention @ g_support # (embed_dim,)
return rEmbedding Network for Trading
┌────────────────────────────────────────────────────────────────────────┐│ Base Embedding Network for Trading │├────────────────────────────────────────────────────────────────────────┤│ ││ Input: Market features [OHLCV, indicators, order book, ...] ││ Shape: (batch_size, sequence_length, feature_dim) ││ ││ ┌─────────────────────────────────────────────────────────────┐ ││ │ Temporal Convolutional Block │ ││ │ ───────────────────────────── │ ││ │ Conv1D(in=features, out=64, kernel=3) → BatchNorm → ReLU │ ││ │ Conv1D(in=64, out=128, kernel=3) → BatchNorm → ReLU │ ││ │ Conv1D(in=128, out=128, kernel=3) → BatchNorm → ReLU │ ││ │ MaxPool1D(kernel=2) │ ││ └─────────────────────────────────────────────────────────────┘ ││ ↓ ││ ┌─────────────────────────────────────────────────────────────┐ ││ │ Bidirectional LSTM Block │ ││ │ ───────────────────────── │ ││ │ BiLSTM(hidden=128, layers=2) │ ││ │ Output: concatenate(forward, backward) = 256 dims │ ││ └─────────────────────────────────────────────────────────────┘ ││ ↓ ││ ┌─────────────────────────────────────────────────────────────┐ ││ │ Projection Head │ ││ │ ─────────────── │ ││ │ Linear(in=256, out=128) → ReLU │ ││ │ Linear(in=128, out=embedding_dim) │ ││ │ L2 Normalize │ ││ └─────────────────────────────────────────────────────────────┘ ││ ↓ ││ Output: g'(x) - Base embedding vector (embedding_dim,) ││ │└────────────────────────────────────────────────────────────────────────┘Full Context Embeddings (FCE)
Why FCE Matters for Trading
Financial markets are highly contextual. The same pattern may have different meanings depending on:
- Market regime: Bull vs. bear market
- Volatility environment: High vs. low volatility
- Other assets: Correlated movements
- Support set composition: Which examples are available
FCE allows the network to:
- Condition support embeddings on the entire support set
- Condition query embeddings on the available support examples
- Create context-aware representations
Support Set Encoding with BiLSTM
/// Full Context Embedding for Support Setpub struct SupportSetEncoder { /// Base embedding network base_embedding: BaseEmbedding, /// Bidirectional LSTM for context bi_lstm: BiLSTM,}
impl SupportSetEncoder { /// Encode support set with full context pub fn encode(&self, support_set: &[MarketWindow]) -> Vec<Embedding> { // Step 1: Get base embeddings for all support examples let base_embeddings: Vec<Embedding> = support_set .iter() .map(|x| self.base_embedding.forward(x)) .collect();
// Step 2: Run bidirectional LSTM let forward_states = self.bi_lstm.forward(&base_embeddings); let backward_states = self.bi_lstm.backward(&base_embeddings);
// Step 3: Combine with skip connection base_embeddings .iter() .enumerate() .map(|(i, base)| { &forward_states[i] + &backward_states[i] + base }) .collect() }}Query Encoding with Attention
/// Full Context Embedding for Querypub struct QueryEncoder { /// Base embedding network base_embedding: BaseEmbedding, /// LSTM for iterative refinement lstm: LSTM, /// Number of processing steps num_steps: usize,}
impl QueryEncoder { /// Encode query with attention over support set pub fn encode( &self, query: &MarketWindow, support_embeddings: &[Embedding], ) -> Embedding { // Get base query embedding let query_base = self.base_embedding.forward(query);
// Initialize LSTM state let mut h = query_base.clone(); let mut c = Embedding::zeros(self.lstm.hidden_dim);
// Iterative refinement with attention for _ in 0..self.num_steps { // Concatenate h with readout for LSTM input let r = self.attention_readout(&h, support_embeddings); let input = concat(&[&query_base, &r]);
// LSTM step (h, c) = self.lstm.step(&input, (&h, &c));
// Skip connection h = h + &query_base; }
h }
/// Compute attention-weighted readout from support embeddings fn attention_readout( &self, h: &Embedding, support_embeddings: &[Embedding], ) -> Embedding { // Compute attention scores (cosine similarity) let scores: Vec<f32> = support_embeddings .iter() .map(|s| cosine_similarity(h, s)) .collect();
// Softmax let attention = softmax(&scores);
// Weighted sum support_embeddings .iter() .zip(attention.iter()) .fold(Embedding::zeros(h.dim()), |acc, (s, &a)| { acc + s * a }) }}Application to Financial Markets
Market Pattern Classification
┌────────────────────────────────────────────────────────────────────────┐│ Market Pattern Classes │├────────────────────────────────────────────────────────────────────────┤│ ││ Class 0: TREND_CONTINUATION ││ ┌───────────────────────────────────────────────────────────┐ ││ │ Characteristics: │ ││ │ • Price moving in established direction │ ││ │ • Pullbacks within trend structure │ ││ │ • Volume confirming moves │ ││ │ • Momentum indicators aligned │ ││ └───────────────────────────────────────────────────────────┘ ││ ││ Class 1: TREND_REVERSAL ││ ┌───────────────────────────────────────────────────────────┐ ││ │ Characteristics: │ ││ │ • Divergence in momentum indicators │ ││ │ • Volume spike on reversal candle │ ││ │ • Break of key support/resistance │ ││ │ • Change in market structure │ ││ └───────────────────────────────────────────────────────────┘ ││ ││ Class 2: BREAKOUT ││ ┌───────────────────────────────────────────────────────────┐ ││ │ Characteristics: │ ││ │ • Price breaking consolidation range │ ││ │ • Volume expansion │ ││ │ • Volatility increase │ ││ │ • Strong directional move │ ││ └───────────────────────────────────────────────────────────┘ ││ ││ Class 3: FALSE_BREAKOUT ││ ┌───────────────────────────────────────────────────────────┐ ││ │ Characteristics: │ ││ │ • Initial break of key level │ ││ │ • Quick reversal back into range │ ││ │ • Volume doesn't confirm │ ││ │ • Trapping traders │ ││ └───────────────────────────────────────────────────────────┘ ││ ││ Class 4: CONSOLIDATION ││ ┌───────────────────────────────────────────────────────────┐ ││ │ Characteristics: │ ││ │ • Range-bound price action │ ││ │ • Decreasing volatility │ ││ │ • Volume contraction │ ││ │ • Coiling for next move │ ││ └───────────────────────────────────────────────────────────┘ ││ │└────────────────────────────────────────────────────────────────────────┘Trading Strategy Based on Pattern Recognition
┌────────────────────────────────────────────────────────────────────────┐│ Pattern-Based Trading Signals │├────────────────────────────────────────────────────────────────────────┤│ ││ Pattern Detection Pipeline: ││ ────────────────────────── ││ 1. Maintain support set of labeled pattern examples ││ 2. Extract features for current market window ││ 3. Embed support set with FCE (g function) ││ 4. Embed query with attention over support (f function) ││ 5. Compute attention weights to all support examples ││ 6. Generate prediction via weighted voting ││ 7. Execute trading signal based on pattern and confidence ││ ││ Pattern → Signal Mapping: ││ ┌───────────────────────┬──────────────────────────────────────┐ ││ │ Pattern │ Action │ ││ ├───────────────────────┼──────────────────────────────────────┤ ││ │ TREND_CONTINUATION │ Enter/hold with trend direction │ ││ │ TREND_REVERSAL │ Close positions, consider reversal │ ││ │ BREAKOUT │ Enter in breakout direction │ ││ │ FALSE_BREAKOUT │ Fade the move, tight stop │ ││ │ CONSOLIDATION │ Wait, reduce position size │ ││ └───────────────────────┴──────────────────────────────────────┘ ││ ││ Confidence-Based Position Sizing: ││ ───────────────────────────────── ││ position_size = base_size × max_attention_weight ││ ││ Interpretability via Attention: ││ ──────────────────────────────── ││ • Highest attention weights reveal which support examples ││ are most similar to the current market ││ • Provides explainable predictions ││ │└────────────────────────────────────────────────────────────────────────┘Few-Shot Market Pattern Recognition
Episode Generation for Training
def generate_matching_episode( dataset, n_way: int = 5, k_shot: int = 5, n_query: int = 10): """ Generate one training episode for matching network.
Args: dataset: Historical market data with pattern labels n_way: Number of pattern classes per episode k_shot: Number of support examples per class n_query: Number of query examples per class
Returns: support_set: List of (features, label) tuples query_set: List of (features, label) tuples """ # Sample n_way classes from available pattern classes available_classes = dataset.get_pattern_classes() sampled_classes = random.sample(available_classes, n_way)
support_set = [] query_set = []
for new_label, original_class in enumerate(sampled_classes): # Get all samples for this pattern class_samples = dataset.get_samples_for_pattern(original_class)
# Sample k_shot + n_query examples sampled_indices = random.sample( range(len(class_samples)), k_shot + n_query )
# Split into support and query for i, idx in enumerate(sampled_indices): if i < k_shot: support_set.append((class_samples[idx], new_label)) else: query_set.append((class_samples[idx], new_label))
return support_set, query_setMatching Network Forward Pass
def matching_network_forward( support_set, query, base_embedding_fn, support_encoder, query_encoder,): """ Forward pass of matching network.
Args: support_set: List of (features, label) tuples query: Query features base_embedding_fn: Base embedding function g' support_encoder: FCE support encoder query_encoder: FCE query encoder
Returns: class_probabilities: Probability distribution over classes attention_weights: Attention weights to each support example """ # Separate features and labels support_features = [s[0] for s in support_set] support_labels = [s[1] for s in support_set] n_classes = len(set(support_labels))
# Encode support set with FCE support_embeddings = support_encoder.encode(support_features)
# Encode query with attention over support query_embedding = query_encoder.encode(query, support_embeddings)
# Compute attention weights (cosine similarity + softmax) similarities = [ cosine_similarity(query_embedding, s_emb) for s_emb in support_embeddings ] attention_weights = softmax(similarities)
# Weighted voting: P(y|x) = Σ a(x, x_i) * y_i class_probabilities = np.zeros(n_classes) for i, (weight, label) in enumerate(zip(attention_weights, support_labels)): class_probabilities[label] += weight
return class_probabilities, attention_weightsImplementation Strategy
Python Implementation (PyTorch)
import torchimport torch.nn as nnimport torch.nn.functional as F
class MatchingNetwork(nn.Module): """ Matching Network for few-shot market pattern classification. """
def __init__( self, input_dim: int, hidden_dim: int = 64, embedding_dim: int = 64, lstm_layers: int = 1, fce_steps: int = 5, use_fce: bool = True, ): super().__init__()
self.use_fce = use_fce self.fce_steps = fce_steps
# Base embedding network self.base_embedding = nn.Sequential( nn.Linear(input_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, embedding_dim), )
if use_fce: # Support set encoder (bidirectional LSTM) self.support_lstm = nn.LSTM( embedding_dim, embedding_dim, num_layers=lstm_layers, bidirectional=True, batch_first=True, ) self.support_proj = nn.Linear(embedding_dim * 2, embedding_dim)
# Query encoder (LSTM with attention) self.query_lstm = nn.LSTMCell( embedding_dim * 2, # Query embedding + readout embedding_dim, )
def encode_support(self, support: torch.Tensor) -> torch.Tensor: """ Encode support set with FCE.
Args: support: (n_support, input_dim)
Returns: embeddings: (n_support, embedding_dim) """ # Base embeddings base_emb = self.base_embedding(support) # (n_support, embedding_dim)
if not self.use_fce: return F.normalize(base_emb, dim=-1)
# Apply bidirectional LSTM base_emb_seq = base_emb.unsqueeze(0) # (1, n_support, embedding_dim) lstm_out, _ = self.support_lstm(base_emb_seq) # (1, n_support, 2*embedding_dim) lstm_out = lstm_out.squeeze(0) # (n_support, 2*embedding_dim)
# Project and add skip connection context_emb = self.support_proj(lstm_out) # (n_support, embedding_dim) full_emb = context_emb + base_emb # Skip connection
return F.normalize(full_emb, dim=-1)
def encode_query( self, query: torch.Tensor, support_embeddings: torch.Tensor, ) -> torch.Tensor: """ Encode query with attention over support.
Args: query: (batch_size, input_dim) support_embeddings: (n_support, embedding_dim)
Returns: embeddings: (batch_size, embedding_dim) """ batch_size = query.shape[0] embed_dim = support_embeddings.shape[-1]
# Base embedding query_base = self.base_embedding(query) # (batch_size, embedding_dim)
if not self.use_fce: return F.normalize(query_base, dim=-1)
# Initialize LSTM state h = query_base c = torch.zeros_like(h)
# Iterative refinement with attention for _ in range(self.fce_steps): # Attention readout r = self._attention_readout(h, support_embeddings) # (batch_size, embedding_dim)
# LSTM input: concatenate query base and readout lstm_input = torch.cat([query_base, r], dim=-1) # (batch_size, 2*embedding_dim)
# LSTM step h, c = self.query_lstm(lstm_input, (h, c))
# Skip connection h = h + query_base
return F.normalize(h, dim=-1)
def _attention_readout( self, query: torch.Tensor, support_embeddings: torch.Tensor, ) -> torch.Tensor: """ Compute attention-weighted readout from support.
Args: query: (batch_size, embedding_dim) support_embeddings: (n_support, embedding_dim)
Returns: readout: (batch_size, embedding_dim) """ # Cosine similarity query_norm = F.normalize(query, dim=-1) support_norm = F.normalize(support_embeddings, dim=-1)
# (batch_size, n_support) similarities = torch.mm(query_norm, support_norm.t()) attention = F.softmax(similarities, dim=-1)
# Weighted sum readout = torch.mm(attention, support_embeddings) # (batch_size, embedding_dim)
return readout
def forward( self, support: torch.Tensor, support_labels: torch.Tensor, query: torch.Tensor, ) -> tuple[torch.Tensor, torch.Tensor]: """ Forward pass for classification.
Args: support: (n_support, input_dim) support_labels: (n_support,) query: (n_query, input_dim)
Returns: class_probs: (n_query, n_classes) attention: (n_query, n_support) """ # Encode support and query support_emb = self.encode_support(support) query_emb = self.encode_query(query, support_emb)
# Compute attention weights similarities = torch.mm(query_emb, support_emb.t()) # (n_query, n_support) attention = F.softmax(similarities, dim=-1)
# Aggregate by class n_classes = support_labels.max().item() + 1 class_probs = torch.zeros(query.shape[0], n_classes, device=query.device)
for i in range(support.shape[0]): class_probs[:, support_labels[i]] += attention[:, i]
return class_probs, attentionRust Implementation
See the src/ directory for the complete Rust implementation with:
network/- Matching network architecturedata/- Bybit and stock data integrationstrategy/- Trading strategy implementationtraining/- Episodic training utilities
Bybit Integration
Real-Time Pattern Detection
use crate::data::bybit::{BybitClient, KlineData};use crate::network::MatchingNetwork;use crate::strategy::PatternStrategy;
/// Real-time pattern detector using Matching Networkspub struct BybitPatternDetector { client: BybitClient, network: MatchingNetwork, support_set: SupportSet, strategy: PatternStrategy,}
impl BybitPatternDetector { /// Detect pattern in current market pub async fn detect_pattern(&self, symbol: &str) -> Result<PatternDetection> { // Fetch recent klines let klines = self.client .get_klines(symbol, "15m", 100) .await?;
// Extract features let features = self.extract_features(&klines);
// Classify using matching network let (class_probs, attention) = self.network.forward( &self.support_set, &features, );
// Get prediction and confidence let predicted_class = class_probs.argmax(); let confidence = class_probs[predicted_class];
// Find most similar support examples let similar_examples = self.get_top_attention(attention, 3);
Ok(PatternDetection { pattern: predicted_class.into(), confidence, similar_examples, trading_signal: self.strategy.generate_signal(predicted_class, confidence), }) }}Feature Extraction
┌────────────────────────────────────────────────────────────────────────┐│ Feature Extraction for Matching │├────────────────────────────────────────────────────────────────────────┤│ ││ Price Features: ││ • Returns (multiple timeframes) ││ • Price position relative to moving averages ││ • Higher highs / lower lows pattern ││ • Support/resistance proximity ││ ││ Volume Features: ││ • Volume relative to average ││ • Volume profile (distribution across price levels) ││ • Buy/Sell volume ratio (from Bybit trades) ││ ││ Volatility Features: ││ • ATR (Average True Range) ││ • Bollinger Band width ││ • Volatility regime indicator ││ ││ Momentum Features: ││ • RSI and RSI divergence ││ • MACD and histogram ││ • Stochastic oscillator ││ ││ Crypto-Specific Features: ││ • Funding rate ││ • Open interest change ││ • Long/Short ratio ││ • Liquidation levels ││ │└────────────────────────────────────────────────────────────────────────┘Risk Management
Position Sizing Based on Confidence
impl PatternStrategy { /// Calculate position size based on pattern confidence pub fn calculate_position_size( &self, confidence: f32, attention_entropy: f32, account_balance: f32, ) -> f32 { // Base position as percentage of account let base_position = account_balance * self.base_risk_pct;
// Confidence scaling (higher confidence = larger position) let confidence_factor = if confidence > 0.8 { 1.0 } else if confidence > 0.6 { 0.7 } else if confidence > 0.4 { 0.4 } else { 0.0 // Don't trade if confidence is too low };
// Attention entropy scaling (lower entropy = more decisive) // High entropy means attention is spread across many examples (uncertain) let entropy_factor = (1.0 - attention_entropy / self.max_entropy).max(0.5);
base_position * confidence_factor * entropy_factor }}Stop Loss and Take Profit
┌────────────────────────────────────────────────────────────────────────┐│ Risk Management Rules by Pattern │├────────────────────────────────────────────────────────────────────────┤│ ││ TREND_CONTINUATION: ││ • Stop: Below recent swing low (long) / above swing high (short) ││ • Take Profit: Trail with ATR-based stop ││ • Risk/Reward: 1:2 minimum ││ ││ TREND_REVERSAL: ││ • Stop: Beyond the reversal level ││ • Take Profit: Previous support/resistance ││ • Risk/Reward: 1:3 minimum (compensate for lower win rate) ││ ││ BREAKOUT: ││ • Stop: Below/above the breakout level ││ • Take Profit: Measured move equal to range ││ • Risk/Reward: 1:2 minimum ││ ││ FALSE_BREAKOUT: ││ • Stop: Beyond the false breakout high/low ││ • Take Profit: Opposite side of range ││ • Risk/Reward: 1:2 minimum ││ ││ CONSOLIDATION: ││ • Action: Reduce position size or stay flat ││ • Wait for breakout signal ││ │└────────────────────────────────────────────────────────────────────────┘Performance Metrics
Classification Metrics
| Metric | Description | Target |
|---|---|---|
| Accuracy | Overall correct predictions | > 60% |
| F1-Score | Harmonic mean of precision/recall | > 0.55 |
| Top-2 Accuracy | Correct class in top 2 predictions | > 80% |
| Confidence Calibration | Predicted confidence matches accuracy | ECE < 0.1 |
Trading Metrics
| Metric | Description | Target |
|---|---|---|
| Sharpe Ratio | Risk-adjusted return | > 1.5 |
| Sortino Ratio | Downside risk-adjusted return | > 2.0 |
| Maximum Drawdown | Largest peak-to-trough decline | < 15% |
| Win Rate | Percentage of winning trades | > 50% |
| Profit Factor | Gross profit / Gross loss | > 1.5 |
Few-Shot Specific Metrics
| Metric | Description | Formula |
|---|---|---|
| 5-shot Accuracy | Accuracy with 5 examples per class | Standard metric |
| 1-shot Accuracy | Accuracy with 1 example per class | Harder metric |
| Attention Entropy | How spread out attention is | $-\sum a_i \log a_i$ |
| Prototype Consistency | Stability across episodes | Variance of predictions |
Comparison with Prototypical Networks
Key Differences
┌────────────────────────────────────────────────────────────────────────┐│ Matching Networks vs Prototypical Networks │├────────────────────────────────────────────────────────────────────────┤│ ││ Classification Mechanism: ││ ───────────────────────── ││ Prototypical: Compare to class centroid (prototype) ││ Matching: Attention-weighted voting over ALL examples ││ ││ Embedding: ││ ────────── ││ Prototypical: Fixed embedding f(x) ││ Matching: Context-dependent via FCE: f(x, S), g(x, S) ││ ││ Computation: ││ ─────────── ││ Prototypical: O(n_classes) distances ││ Matching: O(n_support) attention weights ││ ││ When to Use Each: ││ ───────────────── ││ Prototypical: Patterns form tight clusters ││ Matching: Patterns are more spread out / context-dependent ││ ││ Interpretability: ││ ──────────────── ││ Prototypical: "This looks like the typical X pattern" ││ Matching: "This looks like examples 3, 7, 12 (attention)" ││ │└────────────────────────────────────────────────────────────────────────┘When to Choose Matching Networks
- Complex pattern distributions: When patterns don’t cluster nicely
- Context matters: When the same features mean different things in different contexts
- Interpretability via examples: When you want to know which specific examples are similar
- Small support sets: Works well even with very few examples per class
References
-
Matching Networks for One Shot Learning
- Authors: Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.
- URL: https://arxiv.org/abs/1606.04080
- Year: 2016
- Conference: NeurIPS 2016
-
Prototypical Networks for Few-shot Learning
- Authors: Snell, J., Swersky, K., Zemel, R.
- URL: https://arxiv.org/abs/1703.05175
- Year: 2017
-
Meta-Learning for Semi-Supervised Few-Shot Classification
- Authors: Ren, M., et al.
- URL: https://arxiv.org/abs/1803.00676
- Year: 2018
-
Few-Shot Learning: A Survey
- Authors: Wang, Y., et al.
- URL: https://arxiv.org/abs/1904.05046
- Year: 2019
Quick Start
Python
from matching_network import MatchingNetwork, MarketPatternClassifier
# Create the networknetwork = MatchingNetwork( input_dim=15, hidden_dim=64, embedding_dim=64, use_fce=True, fce_steps=5,)
# Create classifierclassifier = MarketPatternClassifier(network)
# Fit with support setclassifier.fit(support_features, support_labels)
# Classify new patternspredictions, attention = classifier.predict(query_features)
# Interpret predictionsfor i, (pred, att) in enumerate(zip(predictions, attention)): print(f"Query {i}: Pattern {pred}") print(f" Most similar support examples: {att.argsort()[-3:][::-1]}")Rust
use matching_networks_finance::{ MatchingNetwork, Config, BybitClient, PatternStrategy};
#[tokio::main]async fn main() -> Result<()> { // Initialize network let config = Config::default(); let network = MatchingNetwork::new(config)?;
// Load support set let support_set = load_pattern_examples("data/support_set.json")?;
// Connect to Bybit let client = BybitClient::new()?;
// Real-time pattern detection let detection = network.detect_pattern( &support_set, &client.get_klines("BTCUSDT", "15m", 100).await?, )?;
println!("Pattern: {:?}", detection.pattern); println!("Confidence: {:.2}", detection.confidence); println!("Trading Signal: {:?}", detection.trading_signal);
Ok(())}File Structure
84_matching_networks_finance/├── README.md # This file (English)├── README.ru.md # Russian translation├── readme.simple.md # Simple explanation (English)├── readme.simple.ru.md # Simple explanation (Russian)├── README.specify.md # Technical specification├── Cargo.toml # Rust dependencies├── python/│ ├── matching_network.py # Core implementation│ ├── data_loader.py # Data loading utilities│ ├── trainer.py # Training utilities│ └── backtest.py # Backtesting framework├── src/│ ├── lib.rs # Rust library root│ ├── network/│ │ ├── mod.rs│ │ ├── embedding.rs # Embedding networks│ │ ├── attention.rs # Attention mechanisms│ │ └── fce.rs # Full Context Embeddings│ ├── data/│ │ ├── mod.rs│ │ ├── bybit.rs # Bybit API integration│ │ ├── stock.rs # Stock data (yfinance)│ │ └── features.rs # Feature extraction│ ├── strategy/│ │ ├── mod.rs│ │ └── pattern_strategy.rs # Trading strategy│ ├── training/│ │ ├── mod.rs│ │ └── episodic.rs # Episodic training│ └── utils/│ ├── mod.rs│ └── metrics.rs # Performance metrics├── examples/│ ├── basic_matching.rs # Basic usage example│ ├── pattern_detection.rs # Pattern detection example│ └── backtest.rs # Backtesting example└── tests/ └── integration_tests.rs # Integration tests