Chapter 250: GPT Financial Analysis

Introduction

Generative Pre-trained Transformers (GPT) have emerged as a powerful paradigm for financial analysis, leveraging self-supervised learning on massive text corpora to develop rich representations of language, numbers, and context. Unlike traditional NLP models that require task-specific architectures, GPT models learn general-purpose representations during pre-training and can be adapted to diverse financial tasks — from earnings call summarization and sentiment analysis to numerical reasoning over financial statements and market commentary generation.

The core insight of GPT for finance is that financial text follows patterns at multiple levels: syntactic structure, domain-specific terminology, numerical relationships, and implicit sentiment. A sufficiently large language model pre-trained on financial corpora can internalize these patterns and then apply them to downstream tasks with minimal fine-tuning. This “pre-train then adapt” approach has proven particularly effective in finance, where labeled data is scarce but raw text (SEC filings, analyst reports, news articles, earnings transcripts) is abundant.

This chapter presents a framework for applying GPT-style models to financial analysis. We cover the transformer architecture underpinning GPT, the key adaptation strategies (prompt engineering, fine-tuning, few-shot learning), and a working Rust implementation that performs sentiment-driven trading signal generation using data from both stock markets and the Bybit cryptocurrency exchange.

Key Concepts

The Transformer Architecture

The GPT family is built on the transformer decoder architecture introduced by Vaswani et al. (2017). The key innovation is the self-attention mechanism, which allows each token in a sequence to attend to all previous tokens, capturing long-range dependencies without the sequential bottleneck of RNNs.

Given an input sequence of token embeddings $\mathbf{X} \in \mathbb{R}^{T \times d}$, self-attention computes:

$$\text{Attention}(\mathbf{Q}, \mathbf{K}, \mathbf{V}) = \text{softmax}\left(\frac{\mathbf{Q}\mathbf{K}^T}{\sqrt{d_k}}\right)\mathbf{V}$$

where the queries, keys, and values are linear projections of the input:

$$\mathbf{Q} = \mathbf{X}\mathbf{W}_Q, \quad \mathbf{K} = \mathbf{X}\mathbf{W}_K, \quad \mathbf{V} = \mathbf{X}\mathbf{W}_V$$

Multi-head attention extends this by running $h$ parallel attention heads with different learned projections, enabling the model to capture different types of relationships simultaneously:

$$\text{MultiHead}(\mathbf{X}) = \text{Concat}(\text{head}_1, \ldots, \text{head}_h)\mathbf{W}_O$$

Causal Language Modeling

GPT models are trained with a causal (autoregressive) language modeling objective. Given a sequence of tokens $(x_1, x_2, \ldots, x_T)$, the model maximizes the log-likelihood:

$$\mathcal{L} = \sum_{t=1}^{T} \log P(x_t | x_1, \ldots, x_{t-1}; \theta)$$

This is enforced by a causal mask in the attention computation that prevents each position from attending to future positions. The model learns to predict the next token given all preceding context, which implicitly requires understanding syntax, semantics, and factual relationships.

For financial text, this objective teaches the model patterns like: after “Revenue increased by 15% to $”, the next tokens are likely a dollar amount consistent with 15% growth from the previous period’s revenue.

Financial Sentiment Analysis

Sentiment analysis in finance goes beyond simple positive/negative classification. Financial sentiment is nuanced and context-dependent:

Bullish/Bearish signals: Direct indicators of market direction expectations
Uncertainty quantification: Language indicating confidence or doubt about forecasts
Forward-looking statements: Distinguishing past performance from future guidance
Numerical sentiment: “Revenue of $1.2B” is positive if consensus was $1.1B, negative if it was $1.3B

A GPT model fine-tuned on financial text can capture these nuances because it has learned the distributional patterns of financial language during pre-training. The sentiment score for a document $d$ can be expressed as:

$$S(d) = \sigma\left(\mathbf{w}^T \mathbf{h}_{\text{[CLS]}} + b\right)$$

where $\mathbf{h}_{\text{[CLS]}}$ is the hidden state of the classification token from the final transformer layer, and $\sigma$ is the sigmoid function mapping to $[0, 1]$.

Prompt Engineering for Financial Tasks

Rather than fine-tuning, GPT models can be steered via carefully crafted prompts. This is particularly useful in finance where:

Zero-shot analysis: “Classify the following earnings report excerpt as bullish, neutral, or bearish: [text]”
Few-shot learning: Providing 2-5 labeled examples before the query teaches the model the task format and expected output
Chain-of-thought reasoning: “Analyze the following financial statement step by step, then provide a trading recommendation: [text]”

The effectiveness of prompt engineering depends on the prompt template $\mathcal{T}$, the verbalizer $\mathcal{V}$ mapping label words to classes, and the number of demonstrations $k$:

$$P(y | x) = P(\mathcal{V}(y) | \mathcal{T}(x, {(x_i, y_i)}_{i=1}^{k}))$$

Numerical Reasoning in Financial Context

A key challenge for GPT in finance is numerical reasoning. Financial analysis requires:

Percentage calculations: Computing growth rates, margins, ratios
Comparative reasoning: Determining if a metric exceeds expectations
Temporal reasoning: Understanding year-over-year and quarter-over-quarter changes
Scale awareness: Distinguishing millions from billions, basis points from percentages

Recent research shows that GPT models can perform basic numerical reasoning when numbers are tokenized appropriately and the model has seen sufficient examples of financial calculations during pre-training.

ML Approaches

Fine-Tuned GPT for Sentiment Classification

The most direct application fine-tunes a pre-trained GPT model on labeled financial sentiment data. The process involves:

Pre-processing: Tokenize financial texts, handling domain-specific tokens (ticker symbols, financial abbreviations, numerical formats)
Supervised fine-tuning: Train on labeled examples with cross-entropy loss:

$$\mathcal{L}{\text{FT}} = -\sum{i=1}^{N} \sum_{c=1}^{C} y_{ic} \log \hat{y}_{ic}$$

Regularization: Apply dropout, weight decay, and early stopping to prevent overfitting on small financial datasets

The fine-tuned model produces a probability distribution over sentiment classes (bullish, neutral, bearish) for each input text.

GPT-Based Feature Extraction for Trading

Instead of using GPT’s output directly, we can extract embedding features and feed them into traditional trading models:

Extract the hidden state $\mathbf{h} \in \mathbb{R}^d$ from the last transformer layer
Combine with numerical features (price, volume, technical indicators) to form $\mathbf{x} = [\mathbf{h}; \mathbf{f}_{\text{numerical}}]$
Train a gradient boosting or neural network model on the combined features:

$$\hat{y} = f_{\text{trading}}(\mathbf{x}) = f_{\text{trading}}([\mathbf{h}; \mathbf{f}_{\text{numerical}}])$$

This approach leverages GPT’s language understanding while allowing specialized trading models to handle the prediction.

Ensemble with Technical Analysis

The strongest approach combines GPT sentiment signals with traditional technical and order flow features:

$$\text{Signal}_t = \alpha \cdot S_t^{\text{GPT}} + \beta \cdot S_t^{\text{technical}} + \gamma \cdot S_t^{\text{flow}}$$

where $S_t^{\text{GPT}}$ is the GPT-derived sentiment score, $S_t^{\text{technical}}$ aggregates moving averages, RSI, and MACD, and $S_t^{\text{flow}}$ captures order flow imbalance. The weights $\alpha, \beta, \gamma$ are learned via cross-validation on historical data.

Feature Engineering

Text Features from Financial Documents

Key text features that GPT can extract or enhance:

Entity-level sentiment: Sentiment about specific companies, sectors, or products mentioned in the text
Event detection: Identifying material events (mergers, earnings surprises, regulatory actions)
Topic distribution: The mix of topics discussed (revenue, margins, guidance, macro outlook)
Linguistic complexity: Readability and obfuscation metrics that correlate with negative information hiding

Numerical Feature Integration

Financial GPT analysis benefits from combining text features with structured data:

Price momentum: Returns over 1, 5, 20, 60 day windows
Volatility regime: Rolling standard deviation and implied volatility levels
Volume profile: Volume relative to moving average, indicating unusual activity
Market microstructure: Bid-ask spread, depth, and order flow metrics from exchange data

Applications

Earnings Call Analysis

GPT models excel at analyzing earnings call transcripts, which contain both prepared remarks and spontaneous Q&A responses. The model can:

Detect changes in management tone compared to previous quarters
Identify hedging language and qualifiers that signal uncertainty
Extract forward guidance and compare it to consensus expectations
Score the overall bullish/bearish lean of the call

News-Based Trading Signals

Real-time news analysis is one of the highest-value applications. The GPT model processes news articles and produces:

Relevance score: Is this news material for a given asset?
Sentiment score: Bullish or bearish implications
Novelty score: Is this new information or a rehash of known facts?
Urgency score: How quickly should a trader react?

Risk Report Generation

GPT can generate structured risk reports from unstructured data, summarizing:

Key risk factors identified across multiple filings
Changes in risk language over time
Emerging risks not previously disclosed
Correlation between risk language and subsequent market moves

Rust Implementation

Architecture Overview

Our Rust implementation provides a lightweight GPT-inspired framework for financial text analysis. Rather than implementing a full transformer (which requires GPU support), we implement a simplified attention-based text analysis pipeline that demonstrates the key concepts:

Tokenizer: Splits financial text into tokens with domain-specific vocabulary
Embedding Layer: Maps tokens to dense vectors using pre-computed financial embeddings
Attention Mechanism: Single-head scaled dot-product attention for feature extraction
Sentiment Classifier: Linear classifier on attended representations
Trading Signal Generator: Combines text sentiment with price/volume features

Token Embedding

The FinancialTokenizer struct handles text preprocessing with awareness of financial terminology:

pub struct FinancialTokenizer {
    vocab: HashMap<String, usize>,
    embeddings: Vec<Vec<f64>>,
}

It recognizes ticker symbols, numerical patterns, and financial keywords, mapping each to a learned embedding vector.

Attention Computation

The AttentionLayer implements scaled dot-product attention:

pub struct AttentionLayer {
    w_query: Vec<Vec<f64>>,
    w_key: Vec<Vec<f64>>,
    w_value: Vec<Vec<f64>>,
    d_k: f64,
}

This allows the model to weight different parts of the input text based on their relevance to the financial analysis task.

Sentiment Classification

The SentimentClassifier produces a three-class output (bullish, neutral, bearish) using a softmax layer:

pub struct SentimentClassifier {
    weights: Vec<Vec<f64>>,
    bias: Vec<f64>,
}

Trading Signal Integration

The TradingSignalGenerator combines GPT sentiment with technical features:

pub struct TradingSignalGenerator {
    sentiment_weight: f64,
    momentum_weight: f64,
    volatility_weight: f64,
    threshold: f64,
}

Bybit API Integration

The BybitClient struct connects to the Bybit V5 API to fetch real-time market data:

pub struct BybitClient {
    base_url: String,
    client: reqwest::Client,
}

It retrieves kline (candlestick) data and order book snapshots, providing the numerical context that complements the text analysis.

Domain-Adaptive Pretraining for Financial Time Series

While the sections above focus on GPT applied to financial text, GPT-style autoregressive pretraining can also be applied directly to financial time series by treating discretized price movements as a “language” the model learns to predict. This section covers techniques from domain-adaptive pretraining that complement text-based GPT analysis.

Financial Tokenization

Price Discretization

The first challenge in applying GPT to numerical financial data is converting continuous price series into discrete tokens. Given a price series ${p_t}$, compute log-returns:

$$r_t = \ln\left(\frac{p_t}{p_{t-1}}\right)$$

Then quantize into $V$ bins using percentile-based boundaries:

$$\text{token}(r_t) = \arg\min_k |r_t - c_k|$$

where ${c_k}_{k=1}^{V}$ are bin centers derived from the empirical return distribution. A typical vocabulary might use $V = 256$ bins, providing sufficient granularity to capture meaningful price movements while keeping the vocabulary manageable.

Special Tokens

Beyond return tokens, the vocabulary includes:

[BOS]: Beginning of sequence marker
[EOS]: End of sequence marker
[SEP]: Separator between different instruments or timeframes
[MASK]: Used for optional masked language model auxiliary objectives

Multi-Feature Tokenization

For richer representations, multiple features can be tokenized and interleaved:

Price returns (primary signal)
Volume changes (liquidity information)
Volatility regime indicators
Time-of-day tokens (for intraday data)

This creates a multi-channel token sequence: $(r_t^{\text{price}}, r_t^{\text{vol}}, r_t^{\text{regime}}, \ldots)$ that captures the full market microstructure.

Vocabulary Construction

The bin boundaries are determined during preprocessing:

Collect all returns from the training corpus
Compute percentiles ${q_1, q_2, \ldots, q_{V-1}}$ evenly spaced from 0% to 100%
Bin centers $c_k = (q_k + q_{k-1})/2$
Special handling for extreme returns (fat tails): the outermost bins capture all returns beyond the boundary percentiles

This percentile-based approach ensures roughly uniform token frequencies, which improves training stability and prevents the model from being dominated by common small movements.

Fine-Tuning for Trading Tasks

Regime Prediction

After pretraining, a classification head maps the model’s hidden states to regime labels:

$$\hat{y}t = \text{softmax}(W{\text{cls}} \cdot h_t + b_{\text{cls}})$$

where $h_t$ is the transformer output at position $t$, and regimes might include: trending-up, trending-down, mean-reverting, high-volatility, low-volatility.

Direct Signal Generation

For direct trading signal generation, the fine-tuning head produces continuous signals:

$$s_t = \tanh(W_{\text{sig}} \cdot h_t + b_{\text{sig}})$$

where $s_t \in [-1, 1]$ represents the desired position (short to long). The fine-tuning loss combines prediction accuracy with trading-relevant metrics:

$$\mathcal{L}{\text{trade}} = -\frac{1}{T}\sum{t=1}^{T} s_t \cdot r_{t+1} + \lambda \sum_{t=2}^{T} |s_t - s_{t-1}|$$

The second term penalizes excessive position changes (turnover regularization).

Transfer Learning Benefits

Pretraining provides several advantages for the fine-tuning phase:

Data efficiency: Fine-tuning requires far fewer labeled examples than training from scratch
Robustness: Pretrained representations generalize better to unseen market conditions
Multi-task capability: The same pretrained backbone supports multiple downstream tasks
Cross-asset transfer: A model pretrained on one set of instruments can be fine-tuned on others

Autoregressive Generation for Scenario Analysis

Beyond point predictions, the generative nature of GPT allows sampling multiple future trajectories. Temperature scaling controls the randomness of generation: lower temperature produces more conservative (peaked) predictions, while higher temperature explores a wider range of scenarios. This provides a natural framework for risk assessment and uncertainty quantification.

References

Vaswani, A., et al. “Attention Is All You Need.” NeurIPS, 2017.
Radford, A., et al. “Language Models are Unsupervised Multitask Learners.” OpenAI, 2019.
Brown, T., et al. “Language Models are Few-Shot Learners.” NeurIPS, 2020.
Yang, Y., Uy, M.C.S., and Huang, A. “FinBERT: A Pretrained Language Model for Financial Communications.” arXiv:2006.08097, 2020.
Lopez-Lira, A. and Tang, Y. “Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models.” arXiv:2304.07619, 2023.
Wu, S., et al. “BloombergGPT: A Large Language Model for Finance.” arXiv:2303.17564, 2023.
Loughran, T. and McDonald, B. “When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks.” Journal of Finance, 2011.

Running the Code

cd rust
cargo test
cargo run --example trading_example