2026-04-09 agents memory architecture methodology rag context-engineering

Agent Memory Architecture

AI agents forget everything between sessions. Context windows are finite. RAG retrieves documents but doesn’t capture decisions, preferences, or evolving relationships. This is the core problem of agent memory, and several mature approaches now exist to solve it.

This post synthesizes patterns from five memory-focused systems I’ve built or studied: SGR Agents spec (the cognitive foundation), Moltis, OpenClaw Workspace Memory, MemPalace, and Solograph. Each takes a different approach. Together they map the full design space.

The Four Types of Memory

Every agent memory system worth studying converges on the same cognitive taxonomy, mirroring how human cognition organizes knowledge (Generative Agents, Stanford 2023). Names vary, but functions stay consistent:

1. Working Memory (short-term)

The agent’s scratchpad during a task: messages, intermediate results, current plan. Cleared or archived when the task completes.

SGR: The NextStep schema itself. reasoning + planning fields are the working memory, visible on every step. Structured JSON makes thought process inspectable.
MemPalace: L0 + L1 layers (~170 tokens always loaded). Identity + critical facts, the minimum context the agent always has.
Moltis: Current session transcript, auto-exported to memory/sessions/ on close.
Solograph: Active Claude Code session context (task state, tool results, conversation). Becomes episodic memory when session ends.

Key insight: Working memory should be structured, not a raw conversation log. The SGR approach (reasoning/planning/function as explicit JSON fields) makes the agent’s thought process inspectable and debuggable. This aligns with harness engineering: agent logic should be transparent.

2. Semantic Memory (long-term facts)

Durable knowledge about the world, the user, the domain. Timeless or slowly changing. Retrieved by similarity or entity lookup.

SGR: Dict-based, no external DB. memory["semantic"] = { ... } loaded into system prompt at init. Summarize, don’t dump.
OpenClaw: bank/world.md for objective facts, bank/entities/*.md for entity pages. Markdown source of truth + derived SQLite/FTS5 index. Inspired by Hindsight and MemGPT/Letta.
MemPalace: hall_facts corridor within each wing. Verbatim storage (96.6% on LongMemEval) beats compression (84.2%).
Moltis: Hybrid search (vector + FTS5 keyword), with optional LLM reranking (70% LLM score + 30% original).
Solograph: kb_search for knowledge base with MLX embeddings (multilingual-e5-small, RU+EN). Facts, principles, patterns stored as markdown files.

Key insight: Structure beats embeddings for retrieval. MemPalace’s spatial hierarchy (wing > room > hall) improves recall by 34% over flat vector search. Solograph’s graph layer adds relationship traversal on top. But a simple dict in the system prompt works fine to start (SGR v1 approach).

3. Episodic Memory (experience log)

What happened, when, in what order. Decision history, interaction logs, past results. Time-bound and growing. Context graphs call these “decision traces.”

OpenClaw: Daily logs (memory/YYYY-MM-DD.md) + bank/experience.md curated by reflection. The ## Retain section extracts 2-5 narrative, self-contained facts from each day, tagged with type (W world, B experience, O opinion) and entities (@Peter).
MemPalace: hall_events + hall_discoveries corridors. Specialist agent diaries capture domain-specific history.
Solograph: session_search makes every past Claude Code session searchable. Each session = a decision trajectory. “What did I do last time I set up auth?” returns the full trace with reasoning.
Generative Agents: Memory stream of timestamped observations, with retrieval scored by recency x relevance x importance.

Key insight: Episodic ≠ semantic. Episodic is time-ordered (event sequence), semantic is timeless (generalized fact). The OpenClaw ## Retain pattern extracts narrative facts from the raw log, tags with type + entities + confidence. Decision traces compound: each captured trajectory improves future retrieval.

4. Procedural Memory (how-to)

Methods, templates, algorithms, workflows. “How to do X” rather than “what X is.”

SGR: memory["procedural"] stores story structure templates (3-act method), writing instructions. Loaded as recommendations in system prompt.
MemPalace: hall_advice corridor with accumulated recommendations and patterns.
Solograph: codegraph_query + codegraph_explain expose an AST-based code graph (tree-sitter) across all projects. Cypher queries on code structure reveal implementation patterns: “How is auth implemented?” returns structural patterns, not just files.
YAML workflow templates: Predefined agent sequences (video trailers, marketing campaigns). Codified procedural memory. The agent follows a proven template instead of reasoning from scratch.

Key insight: Procedural memory is the most undervalued type. A well-defined workflow template multiplies consistency. CLI-first design makes procedural knowledge testable. Schemas first, logic second applies here too: define the procedure schema before implementing.

Three Operational Loops

Capture → Retrieve → Apply (Context Graphs pattern)

From Foundation Capital’s context graphs thesis:

Capture — record agent decisions into a graph (what + why)
Retrieve — before new task, search for similar precedents
Apply — adapt found patterns to current situation

Each successful action improves future ones, creating a compound learning flywheel. Implemented in Solograph: session_search for precedent retrieval, kb_search for pattern matching, codegraph_query for structural precedents.

Retain → Recall → Reflect (OpenClaw/Hindsight pattern)

Blending Letta/MemGPT control loop with Hindsight memory substrate:

Retain — normalize raw logs into narrative, self-contained facts with type tags (W world, B experience, O opinion) and entity mentions (@Peter, @ProjectX)
Recall — queries over derived index: lexical (FTS5), entity-centric, temporal, opinion-with-confidence
Reflect — scheduled job that updates entity pages, adjusts opinion confidence based on reinforcement/contradiction, proposes edits to core memory

Opinion evolution: each belief has statement + confidence c ∈ [0,1] + evidence links. New facts update confidence by small deltas; big jumps require strong repeated contradiction.

Retrieve → Reflect → Plan (Generative Agents pattern)

Stanford’s Generative Agents introduced the three-layer cognitive architecture:

Retrieve — score memories by recency × relevance × importance
Reflect — periodically synthesize observations into higher-level insights (“What are the top 3 things I’ve learned about X?”)
Plan — create day/hour plans grounded in reflections and current goals

This is the most research-validated loop. The paper showed it produces believable long-term agent behavior across 25 interacting agents.

Forgetting: The Half-Life Problem

Not all memories should persist forever. Decisions expire as policies change, teams change, and context shifts. Context graphs call this the “half-life of decisions.”

Approaches to Forgetting

System	Mechanism
[MemPalace](/wiki/mempalace-agent-memory)	Knowledge graph with `valid_from` / `ended` dates. `kg.invalidate()` marks facts as expired
OpenClaw	Opinion confidence decay. Contradicting evidence reduces `c` score. Reflect job proposes removal
Moltis	Session auto-cleanup by age/count limits. Embedding cache LRU eviction
Generative Agents	Recency score decays exponentially. Old memories naturally rank lower in retrieval
SuCo	Summarize-Condense-Distill: compress old memories into summaries, discard originals

Key insight: Active forgetting > passive accumulation. Without decay, memory becomes noise. The simplest implementation: valid_from/ended timestamps on every fact, with periodic cleanup. Agent self-discipline applies here. Thresholds and limits prevent memory bloat.

Trust Hierarchy: Not All Memory is Equal

A common failure mode: the agent “remembers something” but you can’t tell if it’s trustworthy. Memory systems that treat everything as equal (recent chat, old decisions, daily notes, canonical specs) produce agents that confidently cite stale or wrong information.

The Zero Harness approach introduces an explicit trust hierarchy for memory:

Layer	Trust Level	Example	Update Frequency
Source of truth	Canonical	`source_of_truth.md`, schemas, configs	Rarely, deliberately
Handoff	High	`handoff.md` (current state, decisions, blockers)	Every session end
Daily notes	Medium	Daily log, task notes, runtime snapshots	Daily
Memory entries	Variable	Structured facts, observations	Ongoing
Chat history	Low	Raw conversation, ephemeral context	Per session

Source of truth > volume. The agent knowing which file is canonical matters more than remembering everything. When memory contradicts source of truth, source of truth wins.

Handoff as save game. Not abstract “long-term memory” but a short, living file that captures: what’s happening now, what decisions were made, what’s broken, what’s left to do, which files are load-bearing. Like a game save point. OpenClaw’s ## Retain section, GSD-2’s .gsd/STATE.md, and Claude Code’s session summaries all implement this pattern.

Memory should help act, not just recall. Useful memory units aren’t just facts but actionable artifacts: task notes, daily plans, fixed decisions, integration statuses, runtime snapshots, environment self-maps. Good agent memory is an operational surface you act from, not a museum of recollections.

This connects to harness engineering: the harness defines which artifacts are source of truth (CLAUDE.md, schemas, tests), and the memory system respects that hierarchy.

Lexical-First Retrieval

Most agent memory systems default to vector/semantic search. In practice, engineering queries are exact and operational: “what did we decide about heartbeat?”, “where is the runtime snapshot?”, “which cron is failing?”, “what file is canonical?”

For these, lexical search (BM25/FTS5) outperforms embeddings:

Exact file names, identifiers, integration names: lexical wins
Specific formulations, decisions, promises: lexical wins
“Find something conceptually similar”: embeddings win
“What does X relate to?”: graph queries win

Practical stack, ordered by reliability:

Lexical first (BM25 / FTS5): exact names, identifiers, operational queries
Graph queries (Cypher / FalkorDB): relationships, dependencies, cross-references
Vector search (embeddings): conceptual similarity, fuzzy matching
LLM reranking: optional final pass for relevance scoring

Multiple systems converge on this independently: Moltis uses hybrid (FTS5 + vector + optional LLM reranking), MemPalace found structural filtering beats raw embeddings by 34%, Solograph combines FTS5 + graph + vector.

Key insight: don’t start with “semantic magic.” Start with the simplest retrieval that handles your actual queries. Add embeddings when lexical search fails, which happens less often than you’d expect.

Progressive Context Loading

All five systems converge on the same context engineering pattern: don’t load everything, load progressively.

Layer	What	When	Size
L0	Identity / role	Always	~50 tokens
L1	Core facts / preferences	Always	~100-200 tokens
L2	Topic-relevant facts	On demand	Variable
L3	Deep search results	Explicit query	Variable

MemPalace: Explicit L0-L3 layers
SGR: Dict memory loaded into system prompt (L1), retrieval via tools later (L2-L3)
OpenClaw: memory.md as always-loaded core (L1), derived index for recall (L2-L3)
Moltis: File watching + auto-sync as L1, hybrid search as L2-L3
Solograph: CLAUDE.md as L0-L1 (table of contents, ~100 lines per harness engineering), kb_search/session_search as L2-L3

This is context engineering in practice: treating the context window as a memory budget. Harness keeps L0-L1 stable and tested; L2-L3 are dynamic.

Storage Architecture Comparison

System	Storage	Search	Embeddings	Scale
[SGR v1](/wiki/schema-guided-reasoning)	Python dict (in-memory)	None	None	<100 facts
Moltis	SQLite + FTS5	Hybrid (vector + keyword) + LLM reranking	Local GGUF / OpenAI / Ollama	<5K chunks
OpenClaw	Markdown + SQLite index	FTS5 + optional embeddings	Optional	<10K facts
[MemPalace](/wiki/mempalace-agent-memory)	ChromaDB + SQLite KG	Spatial filtering + vector	Local	22K+ memories
[Solograph](/wiki/project-solograph)	FalkorDB (graph) + SQLite (vector) + filesystem	Cypher graph queries + vector + session search	MLX multilingual-e5-small	Multi-project

Solograph as a Full Memory System

Solograph implements all four memory types on a unified graph + file foundation:

Working memory: active session context (current Claude Code conversation, task state, tool results)
Semantic memory: kb_search with vector embeddings (MLX, RU+EN). Markdown files indexed with embeddings. Local-first, no cloud dependency
Episodic memory: session_search makes every past session a searchable decision trajectory. “What did I do last time I set up auth?” returns the full trace
Procedural memory: codegraph_query + codegraph_explain expose AST-based code graph (tree-sitter) across all projects. Cypher queries on code structure reveal patterns, not just files
Long-term file storage: markdown files in wiki/ and knowledge base. Git-backed, human-readable, versioned. No database lock-in; rebuild index from files anytime

The graph layer (FalkorDB) adds what flat vector search lacks: relationship traversal. codegraph_shared finds shared packages across projects. source_related finds related content by tag overlap. These are graph queries, not similarity search, which is the key difference from standard RAG patterns.

15 MCP tools provide the interface. Agents call session_search, kb_search, codegraph_query naturally.

Practical Recommendations

Starting Simple (SGR v1 approach)

If you’re building your first agent, start with dict-based memory in the system prompt. No database, no embeddings. Three categories: semantic, procedural, rules. Load at init, don’t mutate during session. This alone gets you far. Schemas first: define the memory structure before adding persistence.

Adding Persistence (Moltis/OpenClaw approach)

When you need cross-session recall: Markdown files as source of truth + SQLite/FTS5 derived index. Rebuild index from Markdown anytime. Add vector search when FTS5 isn’t enough. Keep the ## Retain discipline for extracting structured facts. Local-first, offline-first.

Scaling Up (MemPalace/Solograph approach)

For multi-agent systems with rich history: ChromaDB or FalkorDB with spatial/graph hierarchy. Knowledge graph for entity relationships. Session search for full audit trail. Progressive context loading (L0-L3). Active forgetting with temporal validity. Small bets: try one memory type first, add others incrementally.

The Compound Learning Goal

The ultimate goal isn’t remembering but compounding. Every decision trace should make the next decision better. This requires:

Capturing why, not just what
Precedent retrieval before new tasks
Active decay of stale knowledge
Structured reflection (not just accumulation)

Research & References

Papers

MemGPT / Letta — MemGPT: Towards LLMs as Operating Systems (Oct 2023). Virtual context management: paging facts in/out of LLM context like an OS manages memory. Core > archival > recall architecture.
Generative Agents — Generative Agents: Interactive Simulacra of Human Behavior (Apr 2023, Stanford). 25 agents with memory streams, reflection, and planning. Introduced retrieve-reflect-plan.
Hindsight — Hindsight: Posterization for LLM Conversations (Apr 2024). Separates observed vs believed vs summarized. Confidence-bearing opinions that evolve with evidence.
LongMemEval — LongMemEval: Benchmarking Long-Term Memory (Oct 2024). Benchmark where MemPalace scored 96.6%. Tests multi-session reasoning, temporal reasoning, knowledge updates.
SuCo — Summarize, Condense, and Distill (Nov 2024). Memory compaction for long-running agents. Formalizes the forgetting problem.
RAISE — Retrieval-Augmented In-Context Learning (Jan 2024). Entity-centric retrieval from semi-structured sources.
Cognitive Architectures for Language Agents — CoALA (Sep 2023). Survey of memory, action, and decision-making in language agents. Proposes the working/episodic/semantic/procedural taxonomy used here.

Industry

Foundation Capital: Context Graphs (Dec 2025). “System of record for decisions” thesis
Anthropic: Effective Context Engineering. Context window as working memory budget
Manus: Context Engineering Lessons. KV-cache locality, filesystem memory
OpenHands: Context Condensation. Bounded conversation memory
Anthropic: Harness Design for Long-Running Apps. Task state and evaluator design

Tools & Systems

MemPalace: GitHub. Spatial memory, 96.6% LongMemEval, MIT (27K stars)
Solograph: GitHub. Graph + vector MCP server, 15 tools, FalkorDB + MLX
QMD. Mini CLI search engine for docs and knowledge bases (20K stars). BM25 + vector + LLM reranking, all local. Optional Moltis backend
Moltis. Rust-based memory with hybrid search + LLM reranking
OpenClaw. Markdown source-of-truth + derived index, Retain/Recall/Reflect
Letta. MemGPT implementation, core/archival/recall memory management
LangGraph. Graph-based agent execution framework

Knowledge Patterns

Karpathy LLM Wiki. The pattern our wiki builds on. Three-layer architecture: raw sources (immutable) > wiki (LLM-maintained, interlinked) > schema (conventions). Key insight: “The knowledge is compiled once and kept current, not re-derived on every query.” Memory as a persistent, compounding artifact where synthesis accumulates rather than being rediscovered. The LLM handles bookkeeping humans abandon: cross-references, consistency, contradiction detection. This is how our knowledge base works: raw sources in 0-principles/ and 1-methodology/ > wiki pages synthesized by LLM > Solograph indexes everything for retrieval

Related Wiki Pages

context-graphs-summary — agent trajectories and organizational memory
decision-traces-compound — why capturing decision rationale compounds
context-engineering — progressive disclosure and context budgeting
inline-agent-memory — 5th memory location: grep-friendly comments at the call site (AI-NOTE / AI-TODO / AI-ASK)
rag-patterns — 7 RAG approaches, spatial filtering as additional pattern
harness-engineering-summary — agent mistake → fix the harness
agent-mistake-fix-harness — the core harness loop
agent-self-discipline — drift detection and complexity thresholds
schema-guided-reasoning — schemas first, logic second
sgr-deep-dive — complete SGR guide with code examples
privacy-as-architecture — local-first memory design
codegraph-guide — Solograph’s code intelligence layer

Synthesized from: SGR Agents Spec v1 (Sep 2025), OpenClaw Workspace Memory Research (Feb 2026), Moltis Memory System (Mar 2026), MemPalace (Apr 2026), Solograph (ongoing), Foundation Capital Context Graphs (Jan 2026), and 7 research papers.