Skip to content

Embedding & RAG

Lango supports vector embeddings for semantic search and Retrieval-Augmented Generation (RAG). When enabled, relevant knowledge entries are retrieved via vector similarity and injected into the agent's context.

Embedding Providers

Provider Config Type Default Model Default Dimensions Notes
OpenAI openai text-embedding-3-small 1536 Also supports text-embedding-3-large
Google google text-embedding-004 768 Via Google Generative AI API
Local local nomic-embed-text 768 Ollama-compatible, no API key required

Setup

Interactive Setup

The easiest way to configure embedding is through the onboarding wizard:

lango onboard

Select Embedding & RAG from the setup menu.

Config File

Using a Cloud Provider

Reference an existing entry from the providers map via providerID:

Settings: lango settings → Embedding & RAG

{
  "providers": {
    "my-openai": {
      "type": "openai",
      "apiKey": "${OPENAI_API_KEY}"
    }
  },
  "embedding": {
    "providerID": "my-openai",
    "model": "text-embedding-3-small",
    "dimensions": 1536,
    "rag": {
      "enabled": true,
      "maxResults": 5
    }
  }
}

Using Local (Ollama) Embeddings

For local embeddings, set provider to local instead of using providerID:

Settings: lango settings → Embedding & RAG

{
  "embedding": {
    "provider": "local",
    "model": "nomic-embed-text",
    "dimensions": 768,
    "local": {
      "baseUrl": "http://localhost:11434/v1"
    },
    "rag": {
      "enabled": true,
      "maxResults": 5
    }
  }
}

Local Setup

Make sure Ollama is running and the embedding model is pulled:

ollama serve
ollama pull nomic-embed-text

RAG (Retrieval-Augmented Generation)

When embedding.rag.enabled is true, Lango performs semantic retrieval on every agent turn:

  1. The user's message is embedded into a vector
  2. The vector is compared against stored embeddings across collections
  3. The most similar entries are retrieved and injected into the agent's context

Collections

RAG searches across these embedding collections:

Collection Source
knowledge Knowledge store entries
observation Observational memory observations
reflection Observational memory reflections
learning Learning engine entries

You can restrict which collections are searched:

Settings: lango settings → Embedding & RAG

{
  "embedding": {
    "rag": {
      "enabled": true,
      "maxResults": 5,
      "collections": ["knowledge", "learning"]
    }
  }
}

Leave collections empty to search all collections.

Distance Filtering

Set maxDistance to filter out low-relevance results:

Settings: lango settings → Embedding & RAG

{
  "embedding": {
    "rag": {
      "enabled": true,
      "maxResults": 5,
      "maxDistance": 0.8
    }
  }
}

Set to 0.0 (default) to disable distance filtering.

Configuration Reference

Settings: lango settings → Embedding & RAG

{
  "embedding": {
    "providerID": "",
    "provider": "",
    "model": "",
    "dimensions": 0,
    "local": {
      "baseUrl": "http://localhost:11434/v1",
      "model": ""
    },
    "rag": {
      "enabled": false,
      "maxResults": 5,
      "collections": [],
      "maxDistance": 0.0
    }
  }
}
Key Type Default Description
providerID string "" References a key in the providers map
provider string "" Set to "local" for Ollama embeddings
model string varies Embedding model identifier
dimensions int varies Vector dimensionality
local.baseUrl string http://localhost:11434/v1 Ollama API endpoint
local.model string "" Override model for local provider
rag.enabled bool false Enable RAG context injection
rag.maxResults int 5 Maximum results to inject per query
rag.collections []string [] Collections to search (empty = all)
rag.maxDistance float32 0.0 Maximum cosine distance (0.0 = disabled)

Embedding Cache

Lango includes an automatic in-memory embedding cache to reduce redundant API calls:

Parameter Value
TTL 5 minutes
Max entries 100
Eviction Expired entries first, then oldest

The cache is transparent and requires no configuration. It applies to both query embeddings and content embeddings.

Verification

Use lango doctor to verify your embedding configuration:

lango doctor

The doctor checks:

  • Provider configuration is valid
  • API key is set (for cloud providers)
  • Embedding model is accessible
  • Vector store is operational