Memory & Compaction

Persistent memory across conversations and automatic context window management.

Memory gives CoPAW persistent memory across conversations: it automatically manages the context window and writes key information to files for long-term storage.

The memory system provides two core capabilities:

  • Context Management — Automatically compresses conversations into concise summaries before the context window overflows
  • Long-term Memory Management — Writes key information to Markdown files via file tools, with semantic search for recall at any time

The memory design is inspired by the OpenClaw memory architecture and implemented by ReMe.


Architecture Overview

Long-term memory management includes the following capabilities:

Capability Description
Memory Persistence Writes key information to Markdown files via file tools (read / write / edit); files are the source of truth
File Watching Monitors file changes via watchfile, asynchronously updating the local database (semantic index & vector index)
Semantic Search Recalls relevant memories by semantics using vector embeddings + BM25 hybrid search
File Reading Reads the corresponding Memory Markdown files directly via file tools, loading on demand to keep the context lean

Memory File Structure

Memories are stored as plain Markdown files, operated directly by the Agent via file tools. The default workspace uses a two-level structure:

MEMORY.md (Long-term Memory, Optional)

Stores long-lasting, rarely changing key information.

  • Location: {working_dir}/MEMORY.md
  • Purpose: Stores decisions, preferences, and persistent facts
  • Updates: Written by the Agent via write / edit file tools

memory/YYYY-MM-DD.md (Daily Log)

One page per day, appended with the day's work and interactions.

  • Location: {working_dir}/memory/YYYY-MM-DD.md
  • Purpose: Records daily notes and runtime context
  • Updates: Appended by the Agent via write / edit file tools; automatically triggered when conversations become too long and need summarization

When to Write Memory?

Information Type Write Target Method Example
Decisions, preferences, persistent facts MEMORY.md write / edit tools "Project uses Python 3.12", "Prefers pytest framework"
Daily notes, runtime context memory/YYYY-MM-DD.md write / edit tools "Fixed login bug today", "Deployed v2.1"
Auto-summary on context overflow memory/YYYY-MM-DD.md Auto-triggered (summary_memory) When context tokens exceed the threshold, the system automatically writes a conversation summary to the log
User says "remember this" Write to file immediately write tool Do not only save in memory!

Memory Configuration

LLM Configuration

The Memory Manager's LLM parameters are consistent with the global configuration, automatically reading the active LLM config (api_key, base_url, model) from providers.json. The language of memory-related prompts also follows the agents.language field in config.json (zh = Chinese, otherwise English).

Embedding Configuration

Configure the Embedding service via the following environment variables for vector semantic search:

Environment Variable Description Default
EMBEDDING_API_KEY API Key for the Embedding service (empty; vector search is disabled if not configured)
EMBEDDING_BASE_URL URL of the Embedding service https://dashscope.aliyuncs.com/compatible-mode/v1
EMBEDDING_MODEL_NAME Embedding model name text-embedding-v4
EMBEDDING_DIMENSIONS Vector dimensions for initializing the vector database 1024
EMBEDDING_CACHE_ENABLED Whether to enable Embedding cache true
EMBEDDING_MAX_CACHE_SIZE Maximum number of Embedding cache entries 2000
EMBEDDING_MAX_INPUT_LENGTH Maximum input length per Embedding request 8192
EMBEDDING_MAX_BATCH_SIZE Maximum batch size for Embedding requests 10

Search Mode Configuration

Environment Variable Description Default
FTS_ENABLED Whether to enable BM25 full-text search true

Search mode behavior:

Vector Search (EMBEDDING_API_KEY configured) Full-text Search (FTS_ENABLED=true) Actual Search Mode
Yes Yes Vector + BM25 hybrid search (recommended, best results)
Yes No Vector semantic search only
No Yes BM25 full-text search only (poor results in some scenarios)
No No Not allowed — at least one search mode must be enabled

Recommended: Configure EMBEDDING_API_KEY and keep FTS_ENABLED=true to use Vector + BM25 hybrid search for optimal recall.

Underlying Database

Configure the memory storage backend via the MEMORY_STORE_BACKEND environment variable:

Environment Variable Description Default
MEMORY_STORE_BACKEND Memory storage backend: auto, local, chroma, or sqlite auto

Storage backend options:

Backend Description
auto Auto-select: uses local on Windows, chroma on other systems
local Local file storage, no extra dependencies, best compatibility
chroma Chroma vector database, supports efficient vector retrieval; may core dump on some Windows envs
sqlite SQLite database + vector extension; may freeze or crash on macOS 14 and below

Recommended: Use the default auto mode, which automatically selects the most stable backend for your platform.


Searching Memory

The Agent has two ways to retrieve past memories:

Method Tool Use Case Example
Semantic search memory_search Unsure which file contains the info; fuzzy recall by intent "Previous discussion about deployment process"
Direct read read_file Known specific date or file path; precise lookup Read memory/2025-02-13.md

Hybrid Search Explained

Memory search uses Vector + BM25 hybrid search by default. The two search methods complement each other's strengths.

Maps text into a high-dimensional vector space and measures semantic distance via cosine similarity, capturing content with similar meaning but different wording:

Query Recalled Memory Why It Matches
"Database choice for the project" "Finally decided to replace MySQL with PostgreSQL" Semantically related: both discuss database technology choices
"How to reduce unnecessary rebuilds" "Configured incremental compilation to avoid full builds" Semantic equivalence: reduce rebuilds = incremental compilation
"Performance issue discussed last time" "Optimized P99 latency from 800ms to 200ms" Semantic association: performance issue = latency optimization

However, vector search is weaker on precise, high-signal tokens, as embedding models tend to capture overall semantics rather than exact matches of individual tokens.

Based on term frequency statistics for substring matching, excellent for precise token hits, but weaker on semantic understanding (synonyms, paraphrasing).

Query BM25 Hits BM25 Misses
handleWebSocketReconnect Memory fragments containing that function name "WebSocket disconnection reconnection handling logic"
ECONNREFUSED Log entries containing that error code "Database connection refused"

Scoring logic: Splits the query into terms, counts the hit ratio of each term in the target text, and awards a bonus for complete phrase matches:

base_score = hit_terms / total_query_terms           # range [0, 1]
phrase_bonus = 0.2 (only when multi-word query matches the complete phrase)
score = min(1.0, base_score + phrase_bonus)           # capped at 1.0

Example: Query "database connection timeout" hits a passage containing only "database" and "timeout" → base_score = 2/3 ≈ 0.67, no complete phrase match → score = 0.67

To handle ChromaDB's case-sensitive $contains behavior, the search automatically generates multiple case variants for each term (original, lowercase, capitalized, uppercase) to improve recall.

Hybrid Search Fusion

Uses both vector and BM25 recall signals simultaneously, performing weighted fusion on results (default vector weight 0.7, BM25 weight 0.3):

  1. Expand candidate pool: Multiply the desired result count by candidate_multiplier (default 3x, capped at 200); each path retrieves more candidates independently
  2. Independent scoring: Vector and BM25 each return scored result lists
  3. Weighted merging: Deduplicate and fuse by chunk's unique identifier (path + start_line + end_line)
    • Recalled by vector only → final_score = vector_score × 0.7
    • Recalled by BM25 only → final_score = bm25_score × 0.3
    • Recalled by bothfinal_score = vector_score × 0.7 + bm25_score × 0.3
  4. Sort and truncate: Sort by final_score descending, return top-N results

Example: Query "handleWebSocketReconnect disconnection reconnect"

Memory Fragment Vector Score BM25 Score Fused Score Rank
"handleWebSocketReconnect function handles WebSocket disconnection reconnect" 0.85 1.0 0.85 × 0.7 + 1.0 × 0.3 = 0.895 1
"Logic for automatic retry after network disconnection" 0.78 0.0 0.78 × 0.7 = 0.546 2
"Fixed null pointer exception in handleWebSocketReconnect" 0.40 0.5 0.40 × 0.7 + 0.5 × 0.3 = 0.430 3

Summary: Using any single search method alone has blind spots. Hybrid search lets the two signals complement each other, delivering reliable recall whether you're asking in natural language or searching for exact terms.



Compaction

Background: Why Do We Need Compaction?

Imagine the LLM's context window as a backpack with limited capacity. Every conversation turn, every tool call result adds something to the backpack. As the conversation goes on, the backpack gets fuller and fuller...

What happens when the backpack is full?

  • Conversation interrupted - Unable to continue the exchange
  • Quality degradation - The AI starts "forgetting"
  • API errors - Outright failure

Compaction is the magic that helps you "tidy up your backpack" — packing old items into a small box (summary), freeing up space for new things!

What Is Compaction?

Compaction is like writing meeting minutes: condensing a lengthy discussion into key takeaways, while keeping recent conversation content unchanged.

After compaction, subsequent requests use:

  • Compacted summary (replacing old messages)
  • Recent messages (kept as-is)

The compacted summary is persisted, so you don't need to worry about losing it!

The compaction mechanism is inspired by OpenClaw and implemented by ReMe.

Configuration

Environment Variables

Environment Variable Default Description
COPAW_MEMORY_COMPACT_THRESHOLD 100000 Token threshold that triggers auto-compaction (capacity warning line)
COPAW_MEMORY_COMPACT_KEEP_RECENT 3 Number of recent messages to keep after compaction
COPAW_MEMORY_COMPACT_RATIO 0.7 Threshold ratio for triggering compaction (relative to context window)

When Does Compaction Trigger?

CoPaw offers two compaction modes: automatic and manual.

Auto Compaction (When Approaching the Context Threshold)

CoPaw acts like a thoughtful butler, checking how much space is left in the "backpack" before each conversation turn. When the token count of compactable messages exceeds the threshold, it automatically tidies up for you!

Memory Structure Diagram:

Region Description Handling
System Prompt The AI's "persona guide" Always retained, never compacted
Compactable Messages Historical conversation log Token count calculated; compacted into summary when threshold exceeded
Recent Messages Last N messages Kept as-is (N configured by KEEP_RECENT)

Manual Compaction (/compact Command)

Sometimes you want to proactively "clean out your backpack"? No problem! Send the magic spell:

/compact

After execution, you'll see a response like this:

**Compact Complete!**

- Messages compacted: 12
**Compressed Summary:**
<compacted summary content>
- Summary task started in background

Response breakdown:

  • Messages compacted - How many messages were compacted
  • Compressed Summary - The generated summary content
  • Summary task - A background task also starts to store the summary into long-term memory

Compaction Content: What's in the Summary?

The compacted summary is like a project handover document, containing all the key information needed to continue working:

Section Content Example
Goals What the user wants to accomplish "Build a user login system"
Constraints & Preferences Requirements the user mentioned "Use TypeScript, no frameworks"
Progress Completed / in-progress / blocked tasks "Login API done, registration API in progress"
Key Decisions Decisions made and their rationale "Chose JWT over Sessions for statelessness"
Next Steps What to do next "Implement password reset feature"
Key Context Data needed to continue work "Main file is at src/auth.ts"

Tip: Compaction preserves exact file paths, function names, and error messages, ensuring the AI doesn't "lose its memory" and context transitions seamlessly!