Embedding & RAG¶

Lango supports vector embeddings for semantic search and Retrieval-Augmented Generation (RAG). When enabled, relevant knowledge entries are retrieved via vector similarity and injected into the agent's context.

Embedding Providers¶

Provider	Config Type	Default Model	Default Dimensions	Notes
OpenAI	`openai`	`text-embedding-3-small`	1536	Also supports `text-embedding-3-large`
Google	`google`	`text-embedding-004`	768	Via Google Generative AI API
Local	`local`	`nomic-embed-text`	768	Ollama-compatible, no API key required

Setup¶

Interactive Setup¶

The easiest way to configure embedding is through the onboarding wizard:

lango onboard

Select Embedding & RAG from the setup menu.

Config File¶

Using a Cloud Provider¶

Reference an existing entry from the providers map via providerID:

Settings: lango settings → Embedding & RAG

{
  "providers": {
    "my-openai": {
      "type": "openai",
      "apiKey": "${OPENAI_API_KEY}"
    }
  },
  "embedding": {
    "providerID": "my-openai",
    "model": "text-embedding-3-small",
    "dimensions": 1536,
    "rag": {
      "enabled": true,
      "maxResults": 5
    }
  }
}

Using Local (Ollama) Embeddings¶

For local embeddings, set provider to local instead of using providerID:

Settings: lango settings → Embedding & RAG

{
  "embedding": {
    "provider": "local",
    "model": "nomic-embed-text",
    "dimensions": 768,
    "local": {
      "baseUrl": "http://localhost:11434/v1"
    },
    "rag": {
      "enabled": true,
      "maxResults": 5
    }
  }
}

Local Setup

Make sure Ollama is running and the embedding model is pulled:

ollama serve
ollama pull nomic-embed-text

RAG (Retrieval-Augmented Generation)¶

When embedding.rag.enabled is true, Lango performs semantic retrieval on every agent turn:

The user's message is embedded into a vector
The vector is compared against stored embeddings across collections
The most similar entries are retrieved and injected into the agent's context

Collections¶

RAG searches across these embedding collections:

Collection	Source
`knowledge`	Knowledge store entries
`observation`	Observational memory observations
`reflection`	Observational memory reflections
`learning`	Learning engine entries

You can restrict which collections are searched:

Settings: lango settings → Embedding & RAG

{
  "embedding": {
    "rag": {
      "enabled": true,
      "maxResults": 5,
      "collections": ["knowledge", "learning"]
    }
  }
}

Leave collections empty to search all collections.

Distance Filtering¶

Set maxDistance to filter out low-relevance results:

Settings: lango settings → Embedding & RAG

{
  "embedding": {
    "rag": {
      "enabled": true,
      "maxResults": 5,
      "maxDistance": 0.8
    }
  }
}

Set to 0.0 (default) to disable distance filtering.

Configuration Reference¶

Settings: lango settings → Embedding & RAG

{
  "embedding": {
    "providerID": "",
    "provider": "",
    "model": "",
    "dimensions": 0,
    "local": {
      "baseUrl": "http://localhost:11434/v1",
      "model": ""
    },
    "rag": {
      "enabled": false,
      "maxResults": 5,
      "collections": [],
      "maxDistance": 0.0
    }
  }
}

Key	Type	Default	Description
`providerID`	`string`	`""`	References a key in the `providers` map
`provider`	`string`	`""`	Set to `"local"` for Ollama embeddings
`model`	`string`	varies	Embedding model identifier
`dimensions`	`int`	varies	Vector dimensionality
`local.baseUrl`	`string`	`http://localhost:11434/v1`	Ollama API endpoint
`local.model`	`string`	`""`	Override model for local provider
`rag.enabled`	`bool`	`false`	Enable RAG context injection
`rag.maxResults`	`int`	`5`	Maximum results to inject per query
`rag.collections`	`[]string`	`[]`	Collections to search (empty = all)
`rag.maxDistance`	`float32`	`0.0`	Maximum cosine distance (0.0 = disabled)

Embedding Cache¶

Lango includes an automatic in-memory embedding cache to reduce redundant API calls:

Parameter	Value
TTL	5 minutes
Max entries	100
Eviction	Expired entries first, then oldest

The cache is transparent and requires no configuration. It applies to both query embeddings and content embeddings.

Verification¶

Use lango doctor to verify your embedding configuration:

lango doctor

The doctor checks:

Provider configuration is valid
API key is set (for cloud providers)
Embedding model is accessible
Vector store is operational

Knowledge System -- Knowledge entries are embedded for RAG retrieval
Observational Memory -- Observations and reflections are embedded
Knowledge Graph -- Graph RAG combines vector + graph retrieval

Embedding & RAG¶

Embedding Providers¶

Setup¶

Interactive Setup¶

Config File¶

Using a Cloud Provider¶

Using Local (Ollama) Embeddings¶

RAG (Retrieval-Augmented Generation)¶

Collections¶

Distance Filtering¶

Configuration Reference¶

Embedding Cache¶

Verification¶

Related¶