← Wiki

Harness Engineering — development in the age of agents

2026-04-07 summary harnessagentsai-devcontext-engineeringmethodology

Key Takeaways

Harness Engineering — designing environments where AI agents do reliable work. Humans steer, agents execute. Synthesized from three sources: OpenAI experiment (Ryan Lopopolo), Mitchell Hashimoto’s adoption journey, and Birgitta Böckeler’s analysis (Thoughtworks/Martin Fowler).

Core loop: agent mistake → fix the harness (not the prompt, not the model). Fix CLAUDE.md, add a linter, write a structural test. The repository IS the agent’s brain.

Three components (Böckeler/Fowler):

  1. Context EngineeringCLAUDE.md as table of contents (~100 lines, links deeper), docs/ as system of records, progressive disclosure. One big CLAUDE.md is an anti-pattern — pollutes context, rots instantly.
  2. Architectural Constraints — layered domain architecture, directed dependencies validated by custom linters, parse at boundary (Zod/Pydantic), structural tests (ArchUnit-style), taste invariants.
  3. Garbage Collection — golden principles in repo, recurring cleanup agents, doc-gardening agent, quality grades per domain. Tech debt as credit — pay continuously in small installments.

6 adoption steps (Hashimoto): drop chatbot → reproduce your work → end-of-day agents → outsource slam dunks → engineer the harness → always have agent running.

OpenAI numbers: 3→7 engineers, ~1M lines, ~1,500 PRs, 3.5 PR/engineer/day, 0 human code, ~10x speedup. Max single Codex run: 6+ hours.

Predictions: harness as new service templates, tech stack convergence toward AI-friendliness, topology convergence, two worlds (greenfield with harness vs retrofit on legacy).

Connections

Raw source: 1-methodology/harness-engineering.md