Chapter 74: LLM Portfolio Construction
Chapter 74: LLM Portfolio Construction
Overview
Large Language Models (LLMs) can revolutionize portfolio construction by analyzing diverse data sources—news, earnings reports, market commentary, and fundamental data—to generate intelligent asset allocation recommendations. This chapter explores how to use LLMs to build and manage investment portfolios, combining natural language understanding with quantitative optimization techniques.
Trading Strategy
Core Concept: LLMs process financial documents, news sentiment, and market data to generate portfolio weights, asset recommendations, and rebalancing signals.
Entry Signals:
- Long allocation: Positive sentiment + favorable fundamentals identified by LLM
- Increased weight: LLM identifies undervalued assets with growth catalysts
- Reduced weight: LLM detects deteriorating fundamentals or negative sentiment
Edge: LLMs can synthesize vast amounts of unstructured data (earnings calls, news, SEC filings) into actionable portfolio recommendations faster than human analysts, identifying subtle patterns and cross-asset relationships.
Technical Specification
Key Components
- Data Ingestion Pipeline - Collect market data, news, and fundamental data
- LLM Analysis Engine - Generate asset assessments and recommendations
- Portfolio Optimizer - Convert LLM insights into optimal weights
- Risk Management - Constraint-based portfolio construction
- Backtesting Framework - Validate strategy performance
Architecture
┌─────────────────────┐ │ Data Sources │ │ (News, Filings, │ │ Market Data) │ └──────────┬──────────┘ │ ▼ ┌─────────────────────┐ │ Text Processor │ │ (Summarization, │ │ Feature Extract) │ └──────────┬──────────┘ │ ▼ ┌─────────────────────┐ │ LLM Portfolio │ │ Engine │ │ (Analysis + Recs) │ └──────────┬──────────┘ │ ▼ ┌─────────────────────┐ │ Portfolio Optimizer│ │ (Mean-Variance, │ │ Risk Parity) │ └──────────┬──────────┘ │ ▼ ┌─────────────────────┐ │ Execution Layer │ │ (Orders + Rebalance)│ └─────────────────────┘Data Requirements
Market Data:├── OHLCV price data (Bybit for crypto, Yahoo for stocks)├── Trading volume and liquidity metrics├── Volatility and correlation data└── Benchmark indices
Fundamental Data:├── Earnings reports and guidance├── SEC filings (10-K, 10-Q, 8-K)├── Analyst estimates and revisions└── Financial ratios and metrics
Alternative Data:├── News articles and headlines├── Social media sentiment├── Earnings call transcripts└── Macroeconomic indicatorsLLM Portfolio Approaches
The LLM can be used in several ways for portfolio construction:
| Approach | Description | Use Case |
|---|---|---|
| Direct Allocation | LLM outputs portfolio weights directly | Simple, interpretable |
| Scoring + Optimization | LLM scores assets, optimizer sets weights | Combines LLM insight with math |
| Multi-Agent Ensemble | Multiple LLM personas vote on allocation | Robust, diverse perspectives |
| RAG-Enhanced | LLM retrieves relevant data before deciding | Access to real-time information |
Prompt Engineering for Portfolio Construction
PORTFOLIO_CONSTRUCTION_PROMPT = """You are a quantitative portfolio manager. Analyze the following assets and market conditions.
Assets to consider:{asset_list}
Recent market data:{market_data}
News and sentiment:{news_summary}
Current portfolio:{current_portfolio}
Based on this information, provide:
1. Asset Scores (1-10 scale): - Fundamental Score: Quality of financials and business - Momentum Score: Price trend and technical indicators - Sentiment Score: News and social sentiment - Risk Score: Volatility and downside risk
2. Recommended Portfolio Weights (must sum to 100%): - For each asset, provide target weight and reasoning
3. Rebalancing Actions: - What trades to execute - Priority order of trades - Risk considerations
4. Confidence Level: (low/medium/high)
Output as JSON format."""Key Metrics
Portfolio Performance:
- Sharpe Ratio (risk-adjusted return)
- Sortino Ratio (downside risk-adjusted)
- Maximum Drawdown
- Calmar Ratio
- Information Ratio vs benchmark
LLM Quality Metrics:
- Recommendation accuracy
- Ranking correlation (Spearman)
- Hit rate on direction predictions
- Turnover efficiency
Dependencies
# Python dependenciesopenai>=1.0.0 # OpenAI API clientanthropic>=0.5.0 # Claude API clienttransformers>=4.30.0 # HuggingFace modelstorch>=2.0.0 # PyTorchpandas>=2.0.0 # Data manipulationnumpy>=1.24.0 # Numerical computingyfinance>=0.2.0 # Stock datascipy>=1.10.0 # Optimizationcvxpy>=1.4.0 # Convex optimizationrequests>=2.28.0 # HTTP client// Rust dependenciesreqwest = "0.12" // HTTP clientserde = "1.0" // Serializationtokio = "1.0" // Async runtimendarray = "0.16" // Arrayspolars = "0.46" // DataFramesPython Implementation
Portfolio Data Structures
from dataclasses import dataclass, fieldfrom typing import List, Dict, Optionalfrom enum import Enumimport numpy as np
class AssetClass(Enum): EQUITY = "equity" CRYPTO = "crypto" BOND = "bond" COMMODITY = "commodity"
@dataclassclass Asset: """Represents a tradeable asset.""" symbol: str name: str asset_class: AssetClass current_price: float market_cap: Optional[float] = None
@dataclassclass AssetScore: """LLM-generated scores for an asset.""" symbol: str fundamental_score: float # 1-10 momentum_score: float # 1-10 sentiment_score: float # 1-10 risk_score: float # 1-10 (higher = more risk) overall_score: float # Weighted combination reasoning: str # LLM explanation confidence: str # low/medium/high
@property def composite_score(self) -> float: """Calculate weighted composite score.""" # Higher is better, so invert risk score weights = { 'fundamental': 0.30, 'momentum': 0.25, 'sentiment': 0.25, 'risk': 0.20 } return ( weights['fundamental'] * self.fundamental_score + weights['momentum'] * self.momentum_score + weights['sentiment'] * self.sentiment_score + weights['risk'] * (10 - self.risk_score) # Invert risk )
@dataclassclass Portfolio: """Represents a portfolio allocation.""" weights: Dict[str, float] # symbol -> weight cash_weight: float = 0.0 timestamp: str = ""
def __post_init__(self): # Normalize weights to sum to 1 total = sum(self.weights.values()) + self.cash_weight if total > 0: self.weights = {k: v/total for k, v in self.weights.items()} self.cash_weight = self.cash_weight / total
def get_weight(self, symbol: str) -> float: return self.weights.get(symbol, 0.0)
def to_dict(self) -> Dict: return { "weights": self.weights, "cash_weight": self.cash_weight, "timestamp": self.timestamp }LLM Portfolio Engine
import jsonfrom typing import List, Dict, Tupleimport openai
class LLMPortfolioEngine: """LLM-based portfolio construction engine."""
def __init__(self, api_key: str, model: str = "gpt-4"): self.client = openai.OpenAI(api_key=api_key) self.model = model
def analyze_assets( self, assets: List[Asset], market_data: Dict, news_data: List[str] ) -> List[AssetScore]: """Analyze assets and generate scores using LLM."""
# Prepare asset information asset_info = self._format_assets(assets) market_summary = self._format_market_data(market_data) news_summary = self._format_news(news_data)
prompt = self._build_analysis_prompt(asset_info, market_summary, news_summary)
response = self.client.chat.completions.create( model=self.model, messages=[ {"role": "system", "content": "You are a quantitative analyst specializing in portfolio construction."}, {"role": "user", "content": prompt} ], response_format={"type": "json_object"} )
result = json.loads(response.choices[0].message.content) return self._parse_scores(result)
def generate_portfolio( self, scores: List[AssetScore], constraints: Dict = None ) -> Portfolio: """Generate portfolio weights from asset scores."""
if constraints is None: constraints = { "max_weight": 0.30, "min_weight": 0.02, "max_assets": 10, "min_score": 5.0 }
# Filter assets by minimum score valid_scores = [s for s in scores if s.composite_score >= constraints["min_score"]]
# Sort by composite score valid_scores.sort(key=lambda x: x.composite_score, reverse=True)
# Take top N assets selected = valid_scores[:constraints["max_assets"]]
# Calculate weights proportional to scores total_score = sum(s.composite_score for s in selected)
weights = {} for score in selected: raw_weight = score.composite_score / total_score # Apply constraints weight = max(constraints["min_weight"], min(constraints["max_weight"], raw_weight)) weights[score.symbol] = weight
# Normalize total_weight = sum(weights.values()) weights = {k: v/total_weight for k, v in weights.items()}
return Portfolio(weights=weights)
def _build_analysis_prompt( self, assets: str, market: str, news: str ) -> str: return f"""Analyze the following assets for portfolio construction.
ASSETS:{assets}
MARKET CONDITIONS:{market}
RECENT NEWS:{news}
For each asset, provide scores (1-10) and analysis:- fundamental_score: Quality of business and financials- momentum_score: Price trend strength- sentiment_score: News and market sentiment- risk_score: Volatility and downside risk (10 = highest risk)- reasoning: Brief explanation- confidence: low/medium/high
Return JSON with "scores" array containing objects for each asset."""
def _format_assets(self, assets: List[Asset]) -> str: lines = [] for a in assets: lines.append(f"- {a.symbol}: {a.name} ({a.asset_class.value}), Price: ${a.current_price:.2f}") return "\n".join(lines)
def _format_market_data(self, data: Dict) -> str: lines = [] for symbol, info in data.items(): lines.append(f"- {symbol}: Return 7d: {info.get('return_7d', 0):.1%}, Volatility: {info.get('volatility', 0):.1%}") return "\n".join(lines)
def _format_news(self, news: List[str]) -> str: return "\n".join([f"- {n}" for n in news[:10]])
def _parse_scores(self, data: Dict) -> List[AssetScore]: scores = [] for item in data.get("scores", []): scores.append(AssetScore( symbol=item.get("symbol", ""), fundamental_score=float(item.get("fundamental_score", 5)), momentum_score=float(item.get("momentum_score", 5)), sentiment_score=float(item.get("sentiment_score", 5)), risk_score=float(item.get("risk_score", 5)), overall_score=float(item.get("overall_score", 5)), reasoning=item.get("reasoning", ""), confidence=item.get("confidence", "medium") )) return scoresMean-Variance Optimizer
import numpy as npfrom scipy.optimize import minimizefrom typing import Dict, List, Tuple, Optional
class MeanVarianceOptimizer: """Mean-variance portfolio optimization with LLM score integration."""
def __init__( self, risk_free_rate: float = 0.04, target_volatility: Optional[float] = None ): self.risk_free_rate = risk_free_rate self.target_volatility = target_volatility
def optimize( self, expected_returns: np.ndarray, covariance_matrix: np.ndarray, llm_scores: Optional[np.ndarray] = None, constraints: Dict = None ) -> Tuple[np.ndarray, Dict]: """ Optimize portfolio weights.
Args: expected_returns: Expected returns for each asset covariance_matrix: Covariance matrix of returns llm_scores: Optional LLM composite scores to blend constraints: Portfolio constraints
Returns: Tuple of (weights, metrics) """ n_assets = len(expected_returns)
if constraints is None: constraints = { "max_weight": 0.30, "min_weight": 0.0, "long_only": True }
# Blend LLM scores with expected returns if provided if llm_scores is not None: # Normalize LLM scores to be on similar scale as returns normalized_scores = (llm_scores - llm_scores.mean()) / llm_scores.std() blend_weight = 0.3 # Weight given to LLM scores adjusted_returns = ( (1 - blend_weight) * expected_returns + blend_weight * normalized_scores * 0.01 # Scale factor ) else: adjusted_returns = expected_returns
# Initial guess: equal weights x0 = np.ones(n_assets) / n_assets
# Objective: maximize Sharpe ratio (minimize negative Sharpe) def neg_sharpe(weights): port_return = np.dot(weights, adjusted_returns) port_vol = np.sqrt(np.dot(weights.T, np.dot(covariance_matrix, weights))) return -(port_return - self.risk_free_rate) / port_vol
# Constraints cons = [ {'type': 'eq', 'fun': lambda w: np.sum(w) - 1.0} # Weights sum to 1 ]
# Bounds if constraints["long_only"]: bounds = [(constraints["min_weight"], constraints["max_weight"]) for _ in range(n_assets)] else: bounds = [(-constraints["max_weight"], constraints["max_weight"]) for _ in range(n_assets)]
# Optimize result = minimize( neg_sharpe, x0, method='SLSQP', bounds=bounds, constraints=cons )
weights = result.x
# Calculate metrics port_return = np.dot(weights, adjusted_returns) port_vol = np.sqrt(np.dot(weights.T, np.dot(covariance_matrix, weights))) sharpe = (port_return - self.risk_free_rate) / port_vol
metrics = { "expected_return": port_return, "volatility": port_vol, "sharpe_ratio": sharpe, "optimization_success": result.success }
return weights, metrics
def risk_parity( self, covariance_matrix: np.ndarray, risk_budget: Optional[np.ndarray] = None ) -> Tuple[np.ndarray, Dict]: """ Risk parity portfolio allocation.
Each asset contributes equally to portfolio risk. """ n_assets = covariance_matrix.shape[0]
if risk_budget is None: risk_budget = np.ones(n_assets) / n_assets
def risk_contribution_error(weights): port_vol = np.sqrt(np.dot(weights.T, np.dot(covariance_matrix, weights))) marginal_contrib = np.dot(covariance_matrix, weights) risk_contrib = weights * marginal_contrib / port_vol target_contrib = risk_budget * port_vol return np.sum((risk_contrib - target_contrib) ** 2)
x0 = np.ones(n_assets) / n_assets bounds = [(0.01, 0.5) for _ in range(n_assets)] cons = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1.0}]
result = minimize( risk_contribution_error, x0, method='SLSQP', bounds=bounds, constraints=cons )
weights = result.x port_vol = np.sqrt(np.dot(weights.T, np.dot(covariance_matrix, weights)))
metrics = { "volatility": port_vol, "optimization_success": result.success }
return weights, metricsBacktesting Framework
import pandas as pdimport numpy as npfrom typing import Dict, List, Optionalfrom dataclasses import dataclass, fieldfrom datetime import datetime, timedelta
@dataclassclass BacktestResult: """Results from portfolio backtest.""" total_return: float annualized_return: float volatility: float sharpe_ratio: float sortino_ratio: float max_drawdown: float calmar_ratio: float win_rate: float num_trades: int portfolio_values: List[float] = field(default_factory=list) dates: List[str] = field(default_factory=list)
def summary(self) -> str: return f"""Portfolio Backtest Results==========================Total Return: {self.total_return:.2%}Annualized Return: {self.annualized_return:.2%}Volatility: {self.volatility:.2%}Sharpe Ratio: {self.sharpe_ratio:.2f}Sortino Ratio: {self.sortino_ratio:.2f}Max Drawdown: {self.max_drawdown:.2%}Calmar Ratio: {self.calmar_ratio:.2f}Win Rate: {self.win_rate:.2%}Number of Trades: {self.num_trades}"""
class PortfolioBacktester: """Backtest LLM-based portfolio strategies."""
def __init__( self, initial_capital: float = 100000, rebalance_frequency: str = "weekly", # daily, weekly, monthly transaction_cost: float = 0.001, # 0.1% slippage: float = 0.0005 # 0.05% ): self.initial_capital = initial_capital self.rebalance_frequency = rebalance_frequency self.transaction_cost = transaction_cost self.slippage = slippage
def run( self, price_data: pd.DataFrame, portfolio_weights: Dict[str, pd.DataFrame], start_date: str, end_date: str ) -> BacktestResult: """ Run backtest with given portfolio weights.
Args: price_data: DataFrame with asset prices (columns = symbols) portfolio_weights: Dict mapping dates to weight DataFrames start_date: Backtest start date end_date: Backtest end date
Returns: BacktestResult with performance metrics """ # Filter data mask = (price_data.index >= start_date) & (price_data.index <= end_date) prices = price_data.loc[mask].copy()
# Initialize capital = self.initial_capital portfolio_values = [capital] dates = [prices.index[0]] current_weights = {} num_trades = 0
# Determine rebalance dates rebalance_dates = self._get_rebalance_dates(prices.index)
for i in range(1, len(prices)): date = prices.index[i] prev_date = prices.index[i-1]
# Calculate returns daily_returns = (prices.iloc[i] / prices.iloc[i-1]) - 1
# Check for rebalance if date in rebalance_dates and str(date) in portfolio_weights: new_weights = portfolio_weights[str(date)]
# Calculate turnover and costs turnover = self._calculate_turnover(current_weights, new_weights) cost = turnover * (self.transaction_cost + self.slippage) capital *= (1 - cost) num_trades += sum(1 for s in new_weights if new_weights.get(s, 0) != current_weights.get(s, 0))
current_weights = new_weights
# Calculate portfolio return port_return = sum( current_weights.get(symbol, 0) * daily_returns.get(symbol, 0) for symbol in current_weights )
capital *= (1 + port_return) portfolio_values.append(capital) dates.append(date)
# Calculate metrics returns = pd.Series(portfolio_values).pct_change().dropna()
total_return = (capital / self.initial_capital) - 1 trading_days = len(returns) annualized_return = (1 + total_return) ** (252 / trading_days) - 1 volatility = returns.std() * np.sqrt(252) sharpe = annualized_return / volatility if volatility > 0 else 0
# Sortino (downside deviation) downside_returns = returns[returns < 0] downside_std = downside_returns.std() * np.sqrt(252) if len(downside_returns) > 0 else 0.001 sortino = annualized_return / downside_std
# Max drawdown cumulative = pd.Series(portfolio_values) rolling_max = cumulative.expanding().max() drawdowns = (cumulative - rolling_max) / rolling_max max_drawdown = drawdowns.min()
# Calmar calmar = annualized_return / abs(max_drawdown) if max_drawdown != 0 else 0
# Win rate win_rate = len(returns[returns > 0]) / len(returns) if len(returns) > 0 else 0
return BacktestResult( total_return=total_return, annualized_return=annualized_return, volatility=volatility, sharpe_ratio=sharpe, sortino_ratio=sortino, max_drawdown=max_drawdown, calmar_ratio=calmar, win_rate=win_rate, num_trades=num_trades, portfolio_values=portfolio_values, dates=[str(d) for d in dates] )
def _get_rebalance_dates(self, dates: pd.DatetimeIndex) -> set: """Get rebalance dates based on frequency.""" if self.rebalance_frequency == "daily": return set(dates) elif self.rebalance_frequency == "weekly": # Rebalance on Mondays return set(dates[dates.dayofweek == 0]) elif self.rebalance_frequency == "monthly": # Rebalance on first trading day of month return set(dates.to_series().groupby(dates.to_period('M')).first()) return set()
def _calculate_turnover( self, old_weights: Dict[str, float], new_weights: Dict[str, float] ) -> float: """Calculate portfolio turnover.""" all_symbols = set(old_weights.keys()) | set(new_weights.keys()) turnover = sum( abs(new_weights.get(s, 0) - old_weights.get(s, 0)) for s in all_symbols ) / 2 return turnoverRust Implementation
See the rust_llm_portfolio/ directory for the complete Rust implementation, which includes:
- Data fetching from Bybit and Yahoo Finance
- LLM API integration (OpenAI compatible)
- Portfolio optimization algorithms
- Backtesting framework
- Performance metrics calculation
Quick Start (Rust)
cd rust_llm_portfolio
# Build the projectcargo build --release
# Fetch market datacargo run --example fetch_data
# Run portfolio analysiscargo run --example analyze_portfolio -- --symbols BTCUSDT,ETHUSDT,SOLUSDT
# Backtest strategycargo run --example backtest -- --start 2024-01-01 --end 2024-06-01Expected Outcomes
- LLM Analysis Pipeline - End-to-end system for asset scoring
- Portfolio Construction - Optimized weights based on LLM insights
- Risk Management - Constraint-based portfolio construction
- Backtesting Results - Historical performance validation
- Rebalancing Strategy - Dynamic portfolio adjustment rules
Use Cases
Cryptocurrency Portfolio
# Example: Build crypto portfolio with LLMassets = [ Asset("BTCUSDT", "Bitcoin", AssetClass.CRYPTO, 65000), Asset("ETHUSDT", "Ethereum", AssetClass.CRYPTO, 3200), Asset("SOLUSDT", "Solana", AssetClass.CRYPTO, 140), Asset("BNBUSDT", "Binance Coin", AssetClass.CRYPTO, 580),]
# Get LLM scoresscores = engine.analyze_assets(assets, market_data, news)
# Generate portfolioportfolio = engine.generate_portfolio(scores, constraints={ "max_weight": 0.40, # Max 40% in single asset "min_weight": 0.05, # Min 5% allocation "max_assets": 5})Stock Portfolio
# Example: Build diversified stock portfolioassets = [ Asset("AAPL", "Apple Inc", AssetClass.EQUITY, 185), Asset("MSFT", "Microsoft", AssetClass.EQUITY, 420), Asset("GOOGL", "Alphabet", AssetClass.EQUITY, 175), Asset("NVDA", "NVIDIA", AssetClass.EQUITY, 880), Asset("AMZN", "Amazon", AssetClass.EQUITY, 185),]
# Analyze with sector constraintsscores = engine.analyze_assets(assets, market_data, news)portfolio = engine.generate_portfolio(scores, constraints={ "max_weight": 0.25, "min_weight": 0.05, "sector_limits": {"tech": 0.60} # Max 60% in tech})Multi-Agent Ensemble
# Example: Use multiple LLM personas for robust allocationpersonas = [ "value_investor", # Focus on fundamentals "momentum_trader", # Focus on trends "risk_manager", # Focus on downside "contrarian" # Opposite of consensus]
ensemble_weights = {}for persona in personas: scores = engine.analyze_with_persona(assets, persona) weights = engine.generate_portfolio(scores) ensemble_weights[persona] = weights
# Aggregate: average weights across personasfinal_weights = aggregate_portfolios(ensemble_weights)Best Practices
- Prompt Engineering - Test prompts for consistent, actionable output
- Score Calibration - Validate LLM scores against historical outcomes
- Constraint Setting - Use reasonable position limits and diversification
- Regular Validation - Backtest frequently with out-of-sample data
- Human Oversight - Review LLM recommendations before execution
- Cost Management - Cache LLM responses to reduce API costs
- Fallback Logic - Have rules-based backup if LLM fails
References
- Large Language Models in Equity Markets - Comprehensive survey of LLM applications in stock investing
- LLM Agents for Investment Management - Review of agent-based approaches
- FolioLLM: Portfolio Construction with LLMs - Stanford research on ETF allocation
- Persona-Based LLM Ensembles - University of Tokyo research on ensemble methods
- From Text to Returns - Mutual fund optimization with LLMs
- BloombergGPT - Large language model for finance
- FinGPT - Open-source financial LLM
Difficulty Level
Expert
Required knowledge: LLM prompting, portfolio optimization, quantitative finance, API integration, backtesting methodology