← Home

Agent Memory Architecture

AI agents forget everything between sessions. Context windows are finite. RAG retrieves documents but doesn’t capture decisions, preferences, or evolving relationships. This is the core problem of agent memory, and several mature approaches now exist to solve it.

This post synthesizes patterns from five memory-focused systems I’ve built or studied: SGR Agents spec (the cognitive foundation), Moltis, OpenClaw Workspace Memory, MemPalace, and Solograph. Each takes a different approach. Together they map the full design space.


The Four Types of Memory

Every agent memory system worth studying converges on the same cognitive taxonomy, mirroring how human cognition organizes knowledge (Generative Agents, Stanford 2023). Names vary, but functions stay consistent:

1. Working Memory (short-term)

The agent’s scratchpad during a task: messages, intermediate results, current plan. Cleared or archived when the task completes.

Key insight: Working memory should be structured, not a raw conversation log. The SGR approach (reasoning/planning/function as explicit JSON fields) makes the agent’s thought process inspectable and debuggable. This aligns with harness engineering: agent logic should be transparent.

2. Semantic Memory (long-term facts)

Durable knowledge about the world, the user, the domain. Timeless or slowly changing. Retrieved by similarity or entity lookup.

Key insight: Structure beats embeddings for retrieval. MemPalace’s spatial hierarchy (wing > room > hall) improves recall by 34% over flat vector search. Solograph’s graph layer adds relationship traversal on top. But a simple dict in the system prompt works fine to start (SGR v1 approach).

3. Episodic Memory (experience log)

What happened, when, in what order. Decision history, interaction logs, past results. Time-bound and growing. Context graphs call these “decision traces.”

Key insight: Episodic ≠ semantic. Episodic is time-ordered (event sequence), semantic is timeless (generalized fact). The OpenClaw ## Retain pattern extracts narrative facts from the raw log, tags with type + entities + confidence. Decision traces compound: each captured trajectory improves future retrieval.

4. Procedural Memory (how-to)

Methods, templates, algorithms, workflows. “How to do X” rather than “what X is.”

Key insight: Procedural memory is the most undervalued type. A well-defined workflow template multiplies consistency. CLI-first design makes procedural knowledge testable. Schemas first, logic second applies here too: define the procedure schema before implementing.


Three Operational Loops

Capture → Retrieve → Apply (Context Graphs pattern)

From Foundation Capital’s context graphs thesis:

  1. Capture — record agent decisions into a graph (what + why)
  2. Retrieve — before new task, search for similar precedents
  3. Apply — adapt found patterns to current situation

Each successful action improves future ones, creating a compound learning flywheel. Implemented in Solograph: session_search for precedent retrieval, kb_search for pattern matching, codegraph_query for structural precedents.

Retain → Recall → Reflect (OpenClaw/Hindsight pattern)

Blending Letta/MemGPT control loop with Hindsight memory substrate:

  1. Retain — normalize raw logs into narrative, self-contained facts with type tags (W world, B experience, O opinion) and entity mentions (@Peter, @ProjectX)
  2. Recall — queries over derived index: lexical (FTS5), entity-centric, temporal, opinion-with-confidence
  3. Reflect — scheduled job that updates entity pages, adjusts opinion confidence based on reinforcement/contradiction, proposes edits to core memory

Opinion evolution: each belief has statement + confidence c ∈ [0,1] + evidence links. New facts update confidence by small deltas; big jumps require strong repeated contradiction.

Retrieve → Reflect → Plan (Generative Agents pattern)

Stanford’s Generative Agents introduced the three-layer cognitive architecture:

  1. Retrieve — score memories by recency × relevance × importance
  2. Reflect — periodically synthesize observations into higher-level insights (“What are the top 3 things I’ve learned about X?”)
  3. Plan — create day/hour plans grounded in reflections and current goals

This is the most research-validated loop. The paper showed it produces believable long-term agent behavior across 25 interacting agents.


Forgetting: The Half-Life Problem

Not all memories should persist forever. Decisions expire as policies change, teams change, and context shifts. Context graphs call this the “half-life of decisions.”

Approaches to Forgetting

System Mechanism
[MemPalace](/wiki/mempalace-agent-memory) Knowledge graph with valid_from / ended dates. kg.invalidate() marks facts as expired
OpenClaw Opinion confidence decay. Contradicting evidence reduces c score. Reflect job proposes removal
Moltis Session auto-cleanup by age/count limits. Embedding cache LRU eviction
Generative Agents Recency score decays exponentially. Old memories naturally rank lower in retrieval
SuCo Summarize-Condense-Distill: compress old memories into summaries, discard originals

Key insight: Active forgetting > passive accumulation. Without decay, memory becomes noise. The simplest implementation: valid_from/ended timestamps on every fact, with periodic cleanup. Agent self-discipline applies here. Thresholds and limits prevent memory bloat.


Trust Hierarchy: Not All Memory is Equal

A common failure mode: the agent “remembers something” but you can’t tell if it’s trustworthy. Memory systems that treat everything as equal (recent chat, old decisions, daily notes, canonical specs) produce agents that confidently cite stale or wrong information.

The Zero Harness approach introduces an explicit trust hierarchy for memory:

Layer Trust Level Example Update Frequency
Source of truth Canonical source_of_truth.md, schemas, configs Rarely, deliberately
Handoff High handoff.md (current state, decisions, blockers) Every session end
Daily notes Medium Daily log, task notes, runtime snapshots Daily
Memory entries Variable Structured facts, observations Ongoing
Chat history Low Raw conversation, ephemeral context Per session

Source of truth > volume. The agent knowing which file is canonical matters more than remembering everything. When memory contradicts source of truth, source of truth wins.

Handoff as save game. Not abstract “long-term memory” but a short, living file that captures: what’s happening now, what decisions were made, what’s broken, what’s left to do, which files are load-bearing. Like a game save point. OpenClaw’s ## Retain section, GSD-2’s .gsd/STATE.md, and Claude Code’s session summaries all implement this pattern.

Memory should help act, not just recall. Useful memory units aren’t just facts but actionable artifacts: task notes, daily plans, fixed decisions, integration statuses, runtime snapshots, environment self-maps. Good agent memory is an operational surface you act from, not a museum of recollections.

This connects to harness engineering: the harness defines which artifacts are source of truth (CLAUDE.md, schemas, tests), and the memory system respects that hierarchy.


Lexical-First Retrieval

Most agent memory systems default to vector/semantic search. In practice, engineering queries are exact and operational: “what did we decide about heartbeat?”, “where is the runtime snapshot?”, “which cron is failing?”, “what file is canonical?”

For these, lexical search (BM25/FTS5) outperforms embeddings:

Practical stack, ordered by reliability:

  1. Lexical first (BM25 / FTS5): exact names, identifiers, operational queries
  2. Graph queries (Cypher / FalkorDB): relationships, dependencies, cross-references
  3. Vector search (embeddings): conceptual similarity, fuzzy matching
  4. LLM reranking: optional final pass for relevance scoring

Multiple systems converge on this independently: Moltis uses hybrid (FTS5 + vector + optional LLM reranking), MemPalace found structural filtering beats raw embeddings by 34%, Solograph combines FTS5 + graph + vector.

Key insight: don’t start with “semantic magic.” Start with the simplest retrieval that handles your actual queries. Add embeddings when lexical search fails, which happens less often than you’d expect.


Progressive Context Loading

All five systems converge on the same context engineering pattern: don’t load everything, load progressively.

Layer What When Size
L0 Identity / role Always ~50 tokens
L1 Core facts / preferences Always ~100-200 tokens
L2 Topic-relevant facts On demand Variable
L3 Deep search results Explicit query Variable

This is context engineering in practice: treating the context window as a memory budget. Harness keeps L0-L1 stable and tested; L2-L3 are dynamic.


Storage Architecture Comparison

System Storage Search Embeddings Scale
[SGR v1](/wiki/schema-guided-reasoning) Python dict (in-memory) None None <100 facts
Moltis SQLite + FTS5 Hybrid (vector + keyword) + LLM reranking Local GGUF / OpenAI / Ollama <5K chunks
OpenClaw Markdown + SQLite index FTS5 + optional embeddings Optional <10K facts
[MemPalace](/wiki/mempalace-agent-memory) ChromaDB + SQLite KG Spatial filtering + vector Local 22K+ memories
[Solograph](/wiki/project-solograph) FalkorDB (graph) + SQLite (vector) + filesystem Cypher graph queries + vector + session search MLX multilingual-e5-small Multi-project

Solograph as a Full Memory System

Solograph implements all four memory types on a unified graph + file foundation:

The graph layer (FalkorDB) adds what flat vector search lacks: relationship traversal. codegraph_shared finds shared packages across projects. source_related finds related content by tag overlap. These are graph queries, not similarity search, which is the key difference from standard RAG patterns.

15 MCP tools provide the interface. Agents call session_search, kb_search, codegraph_query naturally.


Practical Recommendations

Starting Simple (SGR v1 approach)

If you’re building your first agent, start with dict-based memory in the system prompt. No database, no embeddings. Three categories: semantic, procedural, rules. Load at init, don’t mutate during session. This alone gets you far. Schemas first: define the memory structure before adding persistence.

Adding Persistence (Moltis/OpenClaw approach)

When you need cross-session recall: Markdown files as source of truth + SQLite/FTS5 derived index. Rebuild index from Markdown anytime. Add vector search when FTS5 isn’t enough. Keep the ## Retain discipline for extracting structured facts. Local-first, offline-first.

Scaling Up (MemPalace/Solograph approach)

For multi-agent systems with rich history: ChromaDB or FalkorDB with spatial/graph hierarchy. Knowledge graph for entity relationships. Session search for full audit trail. Progressive context loading (L0-L3). Active forgetting with temporal validity. Small bets: try one memory type first, add others incrementally.

The Compound Learning Goal

The ultimate goal isn’t remembering but compounding. Every decision trace should make the next decision better. This requires:


Research & References

Papers

Industry

Tools & Systems

Knowledge Patterns

Related Wiki Pages


Synthesized from: SGR Agents Spec v1 (Sep 2025), OpenClaw Workspace Memory Research (Feb 2026), Moltis Memory System (Mar 2026), MemPalace (Apr 2026), Solograph (ongoing), Foundation Capital Context Graphs (Jan 2026), and 7 research papers.

Sources

Related