← Wiki

RLHF Book

Nathan Lambert’s open textbook on Reinforcement Learning from Human Feedback. Free online at rlhfbook.com, source on GitHub (1.8k stars, heading to Manning print edition April 2026). Gentle introduction for readers with a quantitative background — not a blog dump, an actual book that covers the full optimization pipeline.

Why this resource is load-bearing

Most RLHF knowledge is scattered across papers, blog posts, and Twitter threads. Lambert stitches it into one linear progression:

  1. Origins & foundations — economics, philosophy, optimal control. Where the preference-learning idea actually came from.
  2. Fundamentals — problem formulation, preference data collection, math framework.
  3. Core pipeline — instruction tuning (IFT) → reward model training → rejection sampling → PPO → direct alignment (DPO).
  4. Advanced — synthetic data, evaluation, tool use, RLVR (reinforcement learning from verifiable rewards, the reasoning-model trick).
  5. Open questions — product applications, future directions.

The book is a living document: v0 April 2025 → v2 February 2026 with reorg, editor feedback, expansion. Companion code library, video course, Kindle + PDF support, Discord community.

What to take from it

Connections

How we use this

Solo founders don’t train foundation models. So the book is a literacy tool, not a manual:

Link: rlhfbook.com · github.com/natolambert/rlhf-book