Chapter 168: NT-Xent Loss and Temperature Scaling

Overview

The Normalized Temperature-scaled Cross Entropy (NT-Xent) loss is a foundational component of modern contrastive learning (e.g., SimCLR). In trading, it is used to learn robust representations by maximizing the agreement between different “views” of the same market situation through a contrastive objective.

The “Magic” of NT-Xent lies in the Temperature parameter ($\tau$), which controls how much the model penalizes “hard” negatives compared to “easy” ones.

The Loss Formula

Given a pair of positive samples $(z_i, z_j)$ in a batch of size $N$, the loss for that pair is:

$$\ell_{i,j} = -\log \frac{\exp(\text{sim}(z_i, z_j) / \tau)}{\sum_{k=1}^{2N} \mathbb{1}_{[k \neq i]} \exp(\text{sim}(z_i, z_k) / \tau)}$$

Where $\text{sim}(u, v) = \frac{u^T v}{|u| |v|}$ is the cosine similarity.

Why Temperature Scaling Matters

Gradient Sharpening: A small $\tau$ (e.g., 0.07) makes the softmax distribution “sharper,” focusing the gradient on the most similar negative samples (the hardest ones).
Feature Uniformity: NT-Xent encourages embeddings to be uniformly distributed on the unit hypersphere, preventing “feature collapse” where all samples map to the same vector.
Robustness in Finance: Financial data is extremely noisy. If $\tau$ is too small, the model may overfit to “noise-induced similarity.” If $\tau$ is too large, it learns too slowly.

Project Structure

168_nt_xent_trading/
├── README.md           # English Overview
├── README.ru.md        # Russian Overview
├── docs/ru/theory.md   # Mathematical deep-dive
├── python/
│   ├── model.py       # Base CNN Encoder
│   ├── nt_xent_loss.py# NT-Xent implementation
│   └── train.py       # Temperature sweep experiments
└── rust/src/
    └── lib.rs         # High-speed NT-Xent for production