Chapter 37: Sentiment-Momentum Fusion — Social Media Enhanced Momentum
Chapter 37: Sentiment-Momentum Fusion — Social Media Enhanced Momentum
Overview
Price momentum и sentiment momentum из социальных сетей могут усиливать друг друга. В этой главе мы комбинируем традиционный price momentum с sentiment сигналами из Twitter, Reddit и StockTwits для построения enhanced momentum стратегии.
Trading Strategy
Суть стратегии: Покупаем акции с положительным price momentum И положительным sentiment momentum. Продаем при дивергенции между ценой и sentiment.
Сигнал на вход:
- Strong Long: Positive price momentum + positive sentiment momentum
- Long: Positive price momentum, neutral sentiment
- Short: Negative price momentum + negative sentiment momentum
- Exit: Divergence (price up, sentiment down or vice versa)
Edge: Sentiment предсказывает будущий momentum; divergence предсказывает reversal
Technical Specification
Notebooks to Create
| # | Notebook | Description |
|---|---|---|
| 1 | 01_data_collection.ipynb | Сбор данных из Twitter, Reddit, StockTwits |
| 2 | 02_text_preprocessing.ipynb | Очистка и подготовка текста |
| 3 | 03_finbert_sentiment.ipynb | FinBERT для финансового sentiment |
| 4 | 04_sentiment_aggregation.ipynb | Агрегация sentiment по тикерам |
| 5 | 05_sentiment_momentum.ipynb | Расчет sentiment momentum |
| 6 | 06_price_momentum.ipynb | Традиционный price momentum |
| 7 | 07_fusion_features.ipynb | Комбинация price + sentiment features |
| 8 | 08_divergence_detection.ipynb | Обнаружение divergence сигналов |
| 9 | 09_ml_model.ipynb | ML модель для combined signal |
| 10 | 10_backtesting.ipynb | Backtest с transaction costs |
| 11 | 11_real_time_pipeline.ipynb | Real-time scoring pipeline |
Data Sources
Twitter/X:├── Cashtags ($AAPL, $TSLA)├── Financial influencers├── Company mentions└── API: Twitter API v2 / Scrapers
Reddit:├── r/wallstreetbets├── r/stocks├── r/investing├── r/options└── API: PRAW (Python Reddit API Wrapper)
StockTwits:├── Message stream per ticker├── Bullish/Bearish labels (user-provided)├── Trending tickers└── API: StockTwits API
News Headlines:├── Financial news (Reuters, Bloomberg)├── Seeking Alpha├── Yahoo Finance└── BenzingaSentiment Extraction with FinBERT
from transformers import AutoTokenizer, AutoModelForSequenceClassificationimport torch
class FinBERTSentiment: def __init__(self): self.tokenizer = AutoTokenizer.from_pretrained('ProsusAI/finbert') self.model = AutoModelForSequenceClassification.from_pretrained('ProsusAI/finbert') self.labels = ['negative', 'neutral', 'positive']
def predict(self, texts): """ Score sentiment for list of texts Returns: scores in [-1, 1] range """ inputs = self.tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors='pt')
with torch.no_grad(): outputs = self.model(**inputs) probs = torch.softmax(outputs.logits, dim=1)
# Convert to continuous score: -1 (negative) to +1 (positive) scores = probs[:, 2] - probs[:, 0] # positive - negative
return scores.numpy()
def predict_with_confidence(self, texts): """Return sentiment score and confidence""" inputs = self.tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors='pt')
with torch.no_grad(): outputs = self.model(**inputs) probs = torch.softmax(outputs.logits, dim=1)
scores = probs[:, 2] - probs[:, 0] confidence = probs.max(dim=1).values
return scores.numpy(), confidence.numpy()Sentiment Aggregation
class SentimentAggregator: """ Aggregate sentiment across multiple sources for each ticker """ def __init__(self, decay_halflife_hours=24): self.decay_halflife = decay_halflife_hours
def aggregate_daily(self, ticker, date, messages): """ Aggregate sentiment for ticker on given date """ if len(messages) == 0: return {'sentiment': 0, 'volume': 0, 'confidence': 0}
# Time decay weighting (more recent = higher weight) hours_ago = (date - messages['timestamp']).dt.total_seconds() / 3600 time_weights = 0.5 ** (hours_ago / self.decay_halflife)
# Volume weighting (engagement) engagement = messages['likes'] + messages['retweets'] + messages['replies'] engagement_weights = np.log1p(engagement)
# Combine weights weights = time_weights * engagement_weights weights = weights / weights.sum()
# Weighted average sentiment weighted_sentiment = (messages['sentiment'] * weights).sum()
# Confidence based on volume and agreement sentiment_std = messages['sentiment'].std() confidence = 1 - sentiment_std # Higher agreement = higher confidence
return { 'sentiment': weighted_sentiment, 'volume': len(messages), 'confidence': confidence, 'bullish_ratio': (messages['sentiment'] > 0).mean() }Sentiment Momentum
def calculate_sentiment_momentum(sentiment_series, windows=[1, 5, 20]): """ Calculate sentiment momentum over multiple windows """ momentum = {}
for window in windows: # Level momentum: current vs past average sma = sentiment_series.rolling(window).mean() momentum[f'sent_mom_{window}d'] = sentiment_series - sma.shift(1)
# Change momentum: acceleration of sentiment momentum[f'sent_change_{window}d'] = sentiment_series.diff(window)
# Volume-weighted momentum if 'volume' in sentiment_series.columns: vw_sent = (sentiment_series['sentiment'] * sentiment_series['volume']).rolling(window).sum() vw_sent /= sentiment_series['volume'].rolling(window).sum() momentum[f'sent_vw_mom_{window}d'] = vw_sent - vw_sent.shift(window)
return pd.DataFrame(momentum)Price-Sentiment Fusion
class SentimentMomentumFusion: """ Combine price momentum with sentiment momentum """ def __init__(self): self.price_weight = 0.6 self.sentiment_weight = 0.4
def calculate_combined_score(self, price_data, sentiment_data): """ Calculate combined momentum score """ # Price momentum (normalized) price_mom = price_data['return_20d'] price_mom_z = (price_mom - price_mom.mean()) / price_mom.std()
# Sentiment momentum (normalized) sent_mom = sentiment_data['sent_mom_5d'] sent_mom_z = (sent_mom - sent_mom.mean()) / sent_mom.std()
# Combined score combined = self.price_weight * price_mom_z + self.sentiment_weight * sent_mom_z
return combined
def detect_divergence(self, price_data, sentiment_data, threshold=1.5): """ Detect divergence between price and sentiment momentum """ price_direction = np.sign(price_data['return_5d']) sent_direction = np.sign(sentiment_data['sent_change_5d'])
# Divergence: price and sentiment moving in opposite directions divergence = price_direction != sent_direction
# Strong divergence: magnitude also significant price_magnitude = abs(price_data['return_5d']) > price_data['return_5d'].std() sent_magnitude = abs(sentiment_data['sent_change_5d']) > sentiment_data['sent_change_5d'].std()
strong_divergence = divergence & price_magnitude & sent_magnitude
return { 'divergence': divergence, 'strong_divergence': strong_divergence, 'type': np.where(price_direction > sent_direction, 'bearish', 'bullish') }ML Model for Signal Generation
class SentimentMomentumModel: """ ML model combining price and sentiment features """ def __init__(self): self.model = LGBMClassifier( n_estimators=100, max_depth=6, learning_rate=0.05 )
self.features = [ # Price features 'return_5d', 'return_20d', 'return_60d', 'volatility_20d', 'volume_ratio',
# Sentiment features 'sentiment_score', 'sentiment_volume', 'sent_mom_5d', 'sent_mom_20d', 'bullish_ratio',
# Interaction features 'price_sent_correlation', 'divergence_flag', 'sentiment_leads_price' ]
def create_labels(self, returns, forward_days=5, threshold=0.02): """ Create classification labels based on forward returns """ forward_returns = returns.shift(-forward_days) labels = np.where(forward_returns > threshold, 1, np.where(forward_returns < -threshold, -1, 0)) return labels
def train(self, X, y): # Time series cross-validation tscv = TimeSeriesSplit(n_splits=5)
for train_idx, val_idx in tscv.split(X): self.model.fit(X.iloc[train_idx], y.iloc[train_idx]) val_pred = self.model.predict(X.iloc[val_idx]) print(f"Validation accuracy: {accuracy_score(y.iloc[val_idx], val_pred)}")
def predict_signal(self, X): proba = self.model.predict_proba(X) # Return probability of positive class minus negative return proba[:, 2] - proba[:, 0]Key Metrics
- Sentiment Quality: Correlation with forward returns, IC
- Signal Quality: Hit rate, Average return per signal
- Strategy: Sharpe, Max DD, Turnover
- Comparison: vs price-only momentum, vs sentiment-only
Dependencies
transformers>=4.30.0torch>=2.0.0tweepy>=4.14.0 # Twitter APIpraw>=7.7.0 # Reddit APIpandas>=1.5.0numpy>=1.23.0lightgbm>=4.0.0yfinance>=0.2.0Expected Outcomes
- Data pipeline для Twitter, Reddit, StockTwits
- FinBERT sentiment scoring для финансовых текстов
- Sentiment aggregation по тикерам и времени
- Fusion strategy комбинирующая price + sentiment
- Divergence signals для reversal prediction
- Backtest results: Improved Sharpe vs price-only momentum
References
- FinBERT: A Pretrained Language Model for Financial Communications
- Predicting Stock Movements with Social Media
- Twitter Mood Predicts the Stock Market
- Retail Investor Sentiment and Behavior
Difficulty Level
⭐⭐⭐⭐☆ (Advanced)
Требуется понимание: NLP/Transformers, Social media APIs, Sentiment analysis, Momentum strategies