Observational Memory¶

Observational Memory enables Lango to maintain context across long conversations by automatically compressing conversation history into observations and reflections.

Overview¶

graph LR
    A[Messages] -->|Token threshold reached| B[Observer]
    B -->|Compressed note| C[Observations]
    C -->|Token threshold reached| D[Reflector]
    D -->|Condensed summary| E[Reflections]
    E -->|Reflection threshold reached| D

As conversations grow, raw message history consumes increasing amounts of the LLM context window. Observational Memory solves this by:

Observing -- Compressing message batches into concise observation notes
Reflecting -- Condensing accumulated observations into higher-level reflections
Injecting -- Adding the most recent observations and reflections into the LLM context

Components¶

Observer¶

The Observer monitors conversation token count and produces compressed observation notes when the message token threshold is reached.

What it captures:

Key decisions made
User intent and goals
Important facts and context
Task progress and outcomes
Action items and next steps

What it omits:

Verbatim tool outputs and code blocks
Redundant greetings
Technical details that can be re-derived

Each observation is a concise paragraph (2-5 sentences) capturing the essential information from a batch of messages.

Reflector¶

The Reflector condenses accumulated observations into higher-level reflections when the observation token threshold is reached.

Merges overlapping information across observations
Resolves contradictions (prefers later observations)
Creates a coherent narrative summary (3-8 sentences)
After reflection, the source observations are deleted to save storage

The Reflector also supports meta-reflections -- condensing multiple reflections into a single higher-generation reflection, enabling indefinite conversation length.

Async Buffer¶

Observation and reflection tasks are queued in an async buffer for background processing. This prevents LLM calls for compression from blocking the main conversation flow.

Token Counter¶

A built-in token estimator tracks message token usage to determine when thresholds are reached. This drives the automatic triggering of observations and reflections.

Context Injection¶

Observations and reflections are injected into the LLM context as part of the Knowledge System's 8-layer architecture:

Layer 7 (Observations) -- Most recent compressed observations
Layer 8 (Reflections) -- Condensed higher-level reflections

Context Limits¶

By default, the following limits apply to context injection:

Limit	Default	Description
Max reflections in context	5	Maximum reflections injected into the LLM prompt
Max observations in context	20	Maximum observations injected into the LLM prompt
Max message token budget	8000	Token budget for recent messages in context

Set any limit to 0 for unlimited injection (not recommended).

Configuration¶

Settings: lango settings → Observational Memory

{
  "observationalMemory": {
    "enabled": true,
    "provider": "",
    "model": "",
    "messageTokenThreshold": 1000,
    "observationTokenThreshold": 2000,
    "maxMessageTokenBudget": 8000,
    "maxReflectionsInContext": 5,
    "maxObservationsInContext": 20
  }
}

Key	Type	Default	Description
`enabled`	`bool`	`false`	Enable the observational memory system
`provider`	`string`	`""`	LLM provider for observer/reflector (empty = agent default)
`model`	`string`	`""`	Model ID for observer/reflector (empty = agent default)
`messageTokenThreshold`	`int`	`1000`	Token count of new messages before triggering an observation
`observationTokenThreshold`	`int`	`2000`	Token count of observations before triggering a reflection
`maxMessageTokenBudget`	`int`	`8000`	Maximum token budget for recent messages in context
`maxReflectionsInContext`	`int`	`5`	Max reflections injected into LLM context (0 = unlimited)
`maxObservationsInContext`	`int`	`20`	Max observations injected into LLM context (0 = unlimited)

Dedicated Model

You can use a smaller, faster model for observation/reflection to reduce cost and latency. Set provider and model to a lightweight option like Gemini Flash or GPT-5.2.

CLI Commands¶

Manage observational memory via the CLI:

# List observations and reflections for the current session
lango memory list

# Show memory system status (counts, token usage)
lango memory status

# Clear all observations and reflections
lango memory clear

How It Works¶

Observation Trigger¶

Message 1  ─┐
Message 2   │  Token count < threshold → accumulate
Message 3   │
Message 4  ─┘  Token count >= 1000 → trigger Observer
                    ↓
              Observation Note (2-5 sentences)

Reflection Trigger¶

Observation 1  ─┐
Observation 2   │  Token count < threshold → accumulate
Observation 3  ─┘  Token count >= 2000 → trigger Reflector
                        ↓
                  Reflection (3-8 sentences)
                        ↓
                  Observations deleted

Meta-Reflection¶

When reflections themselves accumulate, the Reflector can produce meta-reflections:

Reflection 1 (gen 1)  ─┐
Reflection 2 (gen 1)   │ → Reflector → Reflection (gen 2)
Reflection 3 (gen 1)  ─┘

Each generation captures a broader summary, enabling context maintenance for arbitrarily long conversations.

Knowledge System -- Observations and reflections feed into context layers 7 and 8
Embedding & RAG -- Observations can be embedded for semantic retrieval