Embeddings
How CodePilot generates vector embeddings from code chunks using Ollama and nomic-embed-text.
Embeddings are the bridge between human-readable code and machine-searchable vectors. CodePilot converts each code chunk into a 768-dimensional vector using Ollama's nomic-embed-text model.
What Are Embeddings?
An embedding is a fixed-length numerical vector that represents the semantic meaning of text. Similar texts produce similar vectors, enabling similarity-based search even when the exact words differ.
For example, these queries would retrieve similar results despite different wording:
- "How is authentication implemented?"
- "Show me the login flow"
- "Where is the auth middleware?"
Embedding Model
CodePilot uses nomic-embed-text via Ollama:
| Property | Value |
|---|---|
| Model | nomic-embed-text |
| Dimensions | 768 |
| Max Tokens | 8,192 |
| Size | ~274 MB |
| Provider | Ollama (local) |
Why nomic-embed-text?
nomic-embed-text offers a strong balance of quality, speed, and memory usage for code embedding. It runs locally via Ollama with no API keys or network calls required.
Embedding Generation
Each code chunk is sent to Ollama's embedding API:
async function generateEmbedding(text: string): Promise<number[]> {
const response = await fetch(`${OLLAMA_URL}/api/embeddings`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "nomic-embed-text",
prompt: text,
}),
});
const data = await response.json();
return data.embedding; // number[768]
}Batch Processing
During ingestion, embeddings are generated in batches to optimize throughput:
- Chunks are grouped into batches
- Each batch is sent sequentially to Ollama (to avoid overloading the model)
- Generated embeddings are stored alongside chunk metadata in PostgreSQL
Storage
Embeddings are stored in PostgreSQL using the pgvector extension. Each code embedding row contains:
CREATE TABLE "CodeEmbedding" (
"id" TEXT PRIMARY KEY,
"chunkId" TEXT NOT NULL REFERENCES "CodeChunk"("id"),
"embedding" vector(768) NOT NULL,
"repositoryId" TEXT NOT NULL REFERENCES "Repository"("id"),
"createdAt" TIMESTAMP DEFAULT NOW()
);An HNSW index enables fast approximate nearest-neighbor search:
CREATE INDEX ON "CodeEmbedding"
USING hnsw ("embedding" vector_cosine_ops);See Vector Database for details on indexing and search.
Query Embeddings
When a user asks a question, the query text goes through the same embedding process:
- User types: "How does the ingestion pipeline work?"
- The query is embedded using
nomic-embed-text→ 768-dimensional vector - pgvector performs cosine similarity search against all stored code embeddings
- The top-k most similar chunks are returned
Because the query and code chunks are embedded with the same model, semantic similarity is preserved.
Performance Considerations
- Memory:
nomic-embed-textrequires ~274 MB of RAM when loaded - Speed: Embedding generation takes ~10–50ms per chunk depending on length
- Batching: Chunks are processed sequentially to avoid OOM with large repositories
- Caching: Once generated, embeddings are stored persistently and only regenerated when source code changes