Embeddings

How CodePilot generates vector embeddings from code chunks using Ollama and nomic-embed-text.

Embeddings are the bridge between human-readable code and machine-searchable vectors. CodePilot converts each code chunk into a 768-dimensional vector using Ollama's nomic-embed-text model.

What Are Embeddings?

An embedding is a fixed-length numerical vector that represents the semantic meaning of text. Similar texts produce similar vectors, enabling similarity-based search even when the exact words differ.

For example, these queries would retrieve similar results despite different wording:

"How is authentication implemented?"
"Show me the login flow"
"Where is the auth middleware?"

Embedding Model

CodePilot uses nomic-embed-text via Ollama:

Property	Value
Model	`nomic-embed-text`
Dimensions	768
Max Tokens	8,192
Size	~274 MB
Provider	Ollama (local)

Why nomic-embed-text?

nomic-embed-text offers a strong balance of quality, speed, and memory usage for code embedding. It runs locally via Ollama with no API keys or network calls required.

Embedding Generation

Each code chunk is sent to Ollama's embedding API:

async function generateEmbedding(text: string): Promise<number[]> {
  const response = await fetch(`${OLLAMA_URL}/api/embeddings`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "nomic-embed-text",
      prompt: text,
    }),
  });

  const data = await response.json();
  return data.embedding; // number[768]
}

Batch Processing

During ingestion, embeddings are generated in batches to optimize throughput:

Chunks are grouped into batches
Each batch is sent sequentially to Ollama (to avoid overloading the model)
Generated embeddings are stored alongside chunk metadata in PostgreSQL

Storage

Embeddings are stored in PostgreSQL using the pgvector extension. Each code embedding row contains:

CREATE TABLE "CodeEmbedding" (
  "id"            TEXT PRIMARY KEY,
  "chunkId"       TEXT NOT NULL REFERENCES "CodeChunk"("id"),
  "embedding"     vector(768) NOT NULL,
  "repositoryId"  TEXT NOT NULL REFERENCES "Repository"("id"),
  "createdAt"     TIMESTAMP DEFAULT NOW()
);

An HNSW index enables fast approximate nearest-neighbor search:

CREATE INDEX ON "CodeEmbedding"
  USING hnsw ("embedding" vector_cosine_ops);

See Vector Database for details on indexing and search.

Query Embeddings

When a user asks a question, the query text goes through the same embedding process:

User types: "How does the ingestion pipeline work?"
The query is embedded using nomic-embed-text → 768-dimensional vector
pgvector performs cosine similarity search against all stored code embeddings
The top-k most similar chunks are returned

Because the query and code chunks are embedded with the same model, semantic similarity is preserved.

Performance Considerations

Memory: nomic-embed-text requires ~274 MB of RAM when loaded
Speed: Embedding generation takes ~10–50ms per chunk depending on length
Batching: Chunks are processed sequentially to avoid OOM with large repositories
Caching: Once generated, embeddings are stored persistently and only regenerated when source code changes