Agent Memory Architecture
AI agents forget everything between sessions. Context windows are finite. RAG retrieves documents but doesn’t capture decisions, preferences, or evolving relationships. This is the core problem of agent memory, and several mature approaches now exist to solve it.
This post synthesizes patterns from five memory-focused systems I’ve built or studied: SGR Agents spec (the cognitive foundation), Moltis, OpenClaw Workspace Memory, MemPalace, and Solograph. Each takes a different approach. Together they map the full design space.
The Four Types of Memory
Every agent memory system worth studying converges on the same cognitive taxonomy, mirroring how human cognition organizes knowledge (Generative Agents, Stanford 2023). Names vary, but functions stay consistent:
1. Working Memory (short-term)
The agent’s scratchpad during a task: messages, intermediate results, current plan. Cleared or archived when the task completes.
- SGR: The NextStep schema itself.
reasoning+planningfields are the working memory, visible on every step. Structured JSON makes thought process inspectable. - MemPalace: L0 + L1 layers (~170 tokens always loaded). Identity + critical facts, the minimum context the agent always has.
- Moltis: Current session transcript, auto-exported to
memory/sessions/on close. - Solograph: Active Claude Code session context (task state, tool results, conversation). Becomes episodic memory when session ends.
Key insight: Working memory should be structured, not a raw conversation log. The SGR approach (reasoning/planning/function as explicit JSON fields) makes the agent’s thought process inspectable and debuggable. This aligns with harness engineering: agent logic should be transparent.
2. Semantic Memory (long-term facts)
Durable knowledge about the world, the user, the domain. Timeless or slowly changing. Retrieved by similarity or entity lookup.
- SGR: Dict-based, no external DB.
memory["semantic"] = { ... }loaded into system prompt at init. Summarize, don’t dump. - OpenClaw:
bank/world.mdfor objective facts,bank/entities/*.mdfor entity pages. Markdown source of truth + derived SQLite/FTS5 index. Inspired by Hindsight and MemGPT/Letta. - MemPalace:
hall_factscorridor within each wing. Verbatim storage (96.6% on LongMemEval) beats compression (84.2%). - Moltis: Hybrid search (vector + FTS5 keyword), with optional LLM reranking (70% LLM score + 30% original).
- Solograph:
kb_searchfor knowledge base with MLX embeddings (multilingual-e5-small, RU+EN). Facts, principles, patterns stored as markdown files.
Key insight: Structure beats embeddings for retrieval. MemPalace’s spatial hierarchy (wing > room > hall) improves recall by 34% over flat vector search. Solograph’s graph layer adds relationship traversal on top. But a simple dict in the system prompt works fine to start (SGR v1 approach).
3. Episodic Memory (experience log)
What happened, when, in what order. Decision history, interaction logs, past results. Time-bound and growing. Context graphs call these “decision traces.”
- OpenClaw: Daily logs (
memory/YYYY-MM-DD.md) +bank/experience.mdcurated by reflection. The## Retainsection extracts 2-5 narrative, self-contained facts from each day, tagged with type (Wworld,Bexperience,Oopinion) and entities (@Peter). - MemPalace:
hall_events+hall_discoveriescorridors. Specialist agent diaries capture domain-specific history. - Solograph:
session_searchmakes every past Claude Code session searchable. Each session = a decision trajectory. “What did I do last time I set up auth?” returns the full trace with reasoning. - Generative Agents: Memory stream of timestamped observations, with retrieval scored by recency x relevance x importance.
Key insight: Episodic ≠ semantic. Episodic is time-ordered (event sequence), semantic is timeless (generalized fact). The OpenClaw ## Retain pattern extracts narrative facts from the raw log, tags with type + entities + confidence. Decision traces compound: each captured trajectory improves future retrieval.
4. Procedural Memory (how-to)
Methods, templates, algorithms, workflows. “How to do X” rather than “what X is.”
- SGR:
memory["procedural"]stores story structure templates (3-act method), writing instructions. Loaded as recommendations in system prompt. - MemPalace:
hall_advicecorridor with accumulated recommendations and patterns. - Solograph:
codegraph_query+codegraph_explainexpose an AST-based code graph (tree-sitter) across all projects. Cypher queries on code structure reveal implementation patterns: “How is auth implemented?” returns structural patterns, not just files. - YAML workflow templates: Predefined agent sequences (video trailers, marketing campaigns). Codified procedural memory. The agent follows a proven template instead of reasoning from scratch.
Key insight: Procedural memory is the most undervalued type. A well-defined workflow template multiplies consistency. CLI-first design makes procedural knowledge testable. Schemas first, logic second applies here too: define the procedure schema before implementing.
Three Operational Loops
Capture → Retrieve → Apply (Context Graphs pattern)
From Foundation Capital’s context graphs thesis:
- Capture — record agent decisions into a graph (what + why)
- Retrieve — before new task, search for similar precedents
- Apply — adapt found patterns to current situation
Each successful action improves future ones, creating a compound learning flywheel. Implemented in Solograph: session_search for precedent retrieval, kb_search for pattern matching, codegraph_query for structural precedents.
Retain → Recall → Reflect (OpenClaw/Hindsight pattern)
Blending Letta/MemGPT control loop with Hindsight memory substrate:
- Retain — normalize raw logs into narrative, self-contained facts with type tags (
Wworld,Bexperience,Oopinion) and entity mentions (@Peter,@ProjectX) - Recall — queries over derived index: lexical (FTS5), entity-centric, temporal, opinion-with-confidence
- Reflect — scheduled job that updates entity pages, adjusts opinion confidence based on reinforcement/contradiction, proposes edits to core memory
Opinion evolution: each belief has statement + confidence c ∈ [0,1] + evidence links. New facts update confidence by small deltas; big jumps require strong repeated contradiction.
Retrieve → Reflect → Plan (Generative Agents pattern)
Stanford’s Generative Agents introduced the three-layer cognitive architecture:
- Retrieve — score memories by recency × relevance × importance
- Reflect — periodically synthesize observations into higher-level insights (“What are the top 3 things I’ve learned about X?”)
- Plan — create day/hour plans grounded in reflections and current goals
This is the most research-validated loop. The paper showed it produces believable long-term agent behavior across 25 interacting agents.
Forgetting: The Half-Life Problem
Not all memories should persist forever. Decisions expire as policies change, teams change, and context shifts. Context graphs call this the “half-life of decisions.”
Approaches to Forgetting
| System | Mechanism |
|---|---|
| [MemPalace](/wiki/mempalace-agent-memory) | Knowledge graph with valid_from / ended dates. kg.invalidate() marks facts as expired |
| OpenClaw | Opinion confidence decay. Contradicting evidence reduces c score. Reflect job proposes removal |
| Moltis | Session auto-cleanup by age/count limits. Embedding cache LRU eviction |
| Generative Agents | Recency score decays exponentially. Old memories naturally rank lower in retrieval |
| SuCo | Summarize-Condense-Distill: compress old memories into summaries, discard originals |
Key insight: Active forgetting > passive accumulation. Without decay, memory becomes noise. The simplest implementation: valid_from/ended timestamps on every fact, with periodic cleanup. Agent self-discipline applies here. Thresholds and limits prevent memory bloat.
Trust Hierarchy: Not All Memory is Equal
A common failure mode: the agent “remembers something” but you can’t tell if it’s trustworthy. Memory systems that treat everything as equal (recent chat, old decisions, daily notes, canonical specs) produce agents that confidently cite stale or wrong information.
The Zero Harness approach introduces an explicit trust hierarchy for memory:
| Layer | Trust Level | Example | Update Frequency |
|---|---|---|---|
| Source of truth | Canonical | source_of_truth.md, schemas, configs |
Rarely, deliberately |
| Handoff | High | handoff.md (current state, decisions, blockers) |
Every session end |
| Daily notes | Medium | Daily log, task notes, runtime snapshots | Daily |
| Memory entries | Variable | Structured facts, observations | Ongoing |
| Chat history | Low | Raw conversation, ephemeral context | Per session |
Source of truth > volume. The agent knowing which file is canonical matters more than remembering everything. When memory contradicts source of truth, source of truth wins.
Handoff as save game. Not abstract “long-term memory” but a short, living file that captures: what’s happening now, what decisions were made, what’s broken, what’s left to do, which files are load-bearing. Like a game save point. OpenClaw’s ## Retain section, GSD-2’s .gsd/STATE.md, and Claude Code’s session summaries all implement this pattern.
Memory should help act, not just recall. Useful memory units aren’t just facts but actionable artifacts: task notes, daily plans, fixed decisions, integration statuses, runtime snapshots, environment self-maps. Good agent memory is an operational surface you act from, not a museum of recollections.
This connects to harness engineering: the harness defines which artifacts are source of truth (CLAUDE.md, schemas, tests), and the memory system respects that hierarchy.
Lexical-First Retrieval
Most agent memory systems default to vector/semantic search. In practice, engineering queries are exact and operational: “what did we decide about heartbeat?”, “where is the runtime snapshot?”, “which cron is failing?”, “what file is canonical?”
For these, lexical search (BM25/FTS5) outperforms embeddings:
- Exact file names, identifiers, integration names: lexical wins
- Specific formulations, decisions, promises: lexical wins
- “Find something conceptually similar”: embeddings win
- “What does X relate to?”: graph queries win
Practical stack, ordered by reliability:
- Lexical first (BM25 / FTS5): exact names, identifiers, operational queries
- Graph queries (Cypher / FalkorDB): relationships, dependencies, cross-references
- Vector search (embeddings): conceptual similarity, fuzzy matching
- LLM reranking: optional final pass for relevance scoring
Multiple systems converge on this independently: Moltis uses hybrid (FTS5 + vector + optional LLM reranking), MemPalace found structural filtering beats raw embeddings by 34%, Solograph combines FTS5 + graph + vector.
Key insight: don’t start with “semantic magic.” Start with the simplest retrieval that handles your actual queries. Add embeddings when lexical search fails, which happens less often than you’d expect.
Progressive Context Loading
All five systems converge on the same context engineering pattern: don’t load everything, load progressively.
| Layer | What | When | Size |
|---|---|---|---|
| L0 | Identity / role | Always | ~50 tokens |
| L1 | Core facts / preferences | Always | ~100-200 tokens |
| L2 | Topic-relevant facts | On demand | Variable |
| L3 | Deep search results | Explicit query | Variable |
- MemPalace: Explicit L0-L3 layers
- SGR: Dict memory loaded into system prompt (L1), retrieval via tools later (L2-L3)
- OpenClaw:
memory.mdas always-loaded core (L1), derived index for recall (L2-L3) - Moltis: File watching + auto-sync as L1, hybrid search as L2-L3
- Solograph: CLAUDE.md as L0-L1 (table of contents, ~100 lines per harness engineering),
kb_search/session_searchas L2-L3
This is context engineering in practice: treating the context window as a memory budget. Harness keeps L0-L1 stable and tested; L2-L3 are dynamic.
Storage Architecture Comparison
| System | Storage | Search | Embeddings | Scale |
|---|---|---|---|---|
| [SGR v1](/wiki/schema-guided-reasoning) | Python dict (in-memory) | None | None | <100 facts |
| Moltis | SQLite + FTS5 | Hybrid (vector + keyword) + LLM reranking | Local GGUF / OpenAI / Ollama | <5K chunks |
| OpenClaw | Markdown + SQLite index | FTS5 + optional embeddings | Optional | <10K facts |
| [MemPalace](/wiki/mempalace-agent-memory) | ChromaDB + SQLite KG | Spatial filtering + vector | Local | 22K+ memories |
| [Solograph](/wiki/project-solograph) | FalkorDB (graph) + SQLite (vector) + filesystem | Cypher graph queries + vector + session search | MLX multilingual-e5-small | Multi-project |
Solograph as a Full Memory System
Solograph implements all four memory types on a unified graph + file foundation:
- Working memory: active session context (current Claude Code conversation, task state, tool results)
- Semantic memory:
kb_searchwith vector embeddings (MLX, RU+EN). Markdown files indexed with embeddings. Local-first, no cloud dependency - Episodic memory:
session_searchmakes every past session a searchable decision trajectory. “What did I do last time I set up auth?” returns the full trace - Procedural memory:
codegraph_query+codegraph_explainexpose AST-based code graph (tree-sitter) across all projects. Cypher queries on code structure reveal patterns, not just files - Long-term file storage: markdown files in wiki/ and knowledge base. Git-backed, human-readable, versioned. No database lock-in; rebuild index from files anytime
The graph layer (FalkorDB) adds what flat vector search lacks: relationship traversal. codegraph_shared finds shared packages across projects. source_related finds related content by tag overlap. These are graph queries, not similarity search, which is the key difference from standard RAG patterns.
15 MCP tools provide the interface. Agents call session_search, kb_search, codegraph_query naturally.
Practical Recommendations
Starting Simple (SGR v1 approach)
If you’re building your first agent, start with dict-based memory in the system prompt. No database, no embeddings. Three categories: semantic, procedural, rules. Load at init, don’t mutate during session. This alone gets you far. Schemas first: define the memory structure before adding persistence.
Adding Persistence (Moltis/OpenClaw approach)
When you need cross-session recall: Markdown files as source of truth + SQLite/FTS5 derived index. Rebuild index from Markdown anytime. Add vector search when FTS5 isn’t enough. Keep the ## Retain discipline for extracting structured facts. Local-first, offline-first.
Scaling Up (MemPalace/Solograph approach)
For multi-agent systems with rich history: ChromaDB or FalkorDB with spatial/graph hierarchy. Knowledge graph for entity relationships. Session search for full audit trail. Progressive context loading (L0-L3). Active forgetting with temporal validity. Small bets: try one memory type first, add others incrementally.
The Compound Learning Goal
The ultimate goal isn’t remembering but compounding. Every decision trace should make the next decision better. This requires:
- Capturing why, not just what
- Precedent retrieval before new tasks
- Active decay of stale knowledge
- Structured reflection (not just accumulation)
Research & References
Papers
- MemGPT / Letta — MemGPT: Towards LLMs as Operating Systems (Oct 2023). Virtual context management: paging facts in/out of LLM context like an OS manages memory. Core > archival > recall architecture.
- Generative Agents — Generative Agents: Interactive Simulacra of Human Behavior (Apr 2023, Stanford). 25 agents with memory streams, reflection, and planning. Introduced retrieve-reflect-plan.
- Hindsight — Hindsight: Posterization for LLM Conversations (Apr 2024). Separates observed vs believed vs summarized. Confidence-bearing opinions that evolve with evidence.
- LongMemEval — LongMemEval: Benchmarking Long-Term Memory (Oct 2024). Benchmark where MemPalace scored 96.6%. Tests multi-session reasoning, temporal reasoning, knowledge updates.
- SuCo — Summarize, Condense, and Distill (Nov 2024). Memory compaction for long-running agents. Formalizes the forgetting problem.
- RAISE — Retrieval-Augmented In-Context Learning (Jan 2024). Entity-centric retrieval from semi-structured sources.
- Cognitive Architectures for Language Agents — CoALA (Sep 2023). Survey of memory, action, and decision-making in language agents. Proposes the working/episodic/semantic/procedural taxonomy used here.
Industry
- Foundation Capital: Context Graphs (Dec 2025). “System of record for decisions” thesis
- Anthropic: Effective Context Engineering. Context window as working memory budget
- Manus: Context Engineering Lessons. KV-cache locality, filesystem memory
- OpenHands: Context Condensation. Bounded conversation memory
- Anthropic: Harness Design for Long-Running Apps. Task state and evaluator design
Tools & Systems
- MemPalace: GitHub. Spatial memory, 96.6% LongMemEval, MIT (27K stars)
- Solograph: GitHub. Graph + vector MCP server, 15 tools, FalkorDB + MLX
- QMD. Mini CLI search engine for docs and knowledge bases (20K stars). BM25 + vector + LLM reranking, all local. Optional Moltis backend
- Moltis. Rust-based memory with hybrid search + LLM reranking
- OpenClaw. Markdown source-of-truth + derived index, Retain/Recall/Reflect
- Letta. MemGPT implementation, core/archival/recall memory management
- LangGraph. Graph-based agent execution framework
Knowledge Patterns
- Karpathy LLM Wiki. The pattern our wiki builds on. Three-layer architecture: raw sources (immutable) > wiki (LLM-maintained, interlinked) > schema (conventions). Key insight: “The knowledge is compiled once and kept current, not re-derived on every query.” Memory as a persistent, compounding artifact where synthesis accumulates rather than being rediscovered. The LLM handles bookkeeping humans abandon: cross-references, consistency, contradiction detection. This is how our knowledge base works: raw sources in
0-principles/and1-methodology/> wiki pages synthesized by LLM > Solograph indexes everything for retrieval
Related Wiki Pages
- context-graphs-summary — agent trajectories and organizational memory
- decision-traces-compound — why capturing decision rationale compounds
- context-engineering — progressive disclosure and context budgeting
- rag-patterns — 7 RAG approaches, spatial filtering as additional pattern
- harness-engineering-summary — agent mistake → fix the harness
- agent-mistake-fix-harness — the core harness loop
- agent-self-discipline — drift detection and complexity thresholds
- schema-guided-reasoning — schemas first, logic second
- sgr-deep-dive — complete SGR guide with code examples
- privacy-as-architecture — local-first memory design
- codegraph-guide — Solograph’s code intelligence layer
Synthesized from: SGR Agents Spec v1 (Sep 2025), OpenClaw Workspace Memory Research (Feb 2026), Moltis Memory System (Mar 2026), MemPalace (Apr 2026), Solograph (ongoing), Foundation Capital Context Graphs (Jan 2026), and 7 research papers.