Repository Chat
How CodePilot uses Retrieval-Augmented Generation (RAG) to answer questions about codebases.
Repository Chat is the primary interface for interacting with an indexed codebase. It uses Retrieval-Augmented Generation (RAG) to provide accurate, context-aware answers grounded in the actual source code.
How RAG Works
RAG (Retrieval-Augmented Generation) combines information retrieval with language model generation:
Embed the Query
The user's question is converted to a 768-dimensional vector using the same nomic-embed-text model used during ingestion.
const queryEmbedding = await generateEmbedding(userQuestion);Retrieve Relevant Code
pgvector performs cosine similarity search to find the most relevant code chunks:
SELECT cc."content", cc."filePath", cc."startLine", cc."endLine"
FROM "CodeEmbedding" ce
JOIN "CodeChunk" cc ON cc."id" = ce."chunkId"
WHERE ce."repositoryId" = $1
ORDER BY ce."embedding" <=> $2::vector
LIMIT 10;The top-10 most similar chunks are retrieved, providing the LLM with relevant context.
Build the Prompt
Retrieved chunks are composed into a structured prompt:
const prompt = `You are an AI code assistant. Answer the user's question
based on the following code context from their repository.
## Code Context
${retrievedChunks.map(chunk =>
`### ${chunk.filePath} (lines ${chunk.startLine}-${chunk.endLine})
\`\`\`
${chunk.content}
\`\`\``
).join("\n\n")}
## Question
${userQuestion}
## Instructions
- Answer based on the provided code context
- Reference specific files and line numbers
- If the context doesn't contain enough information, say so
- Provide code examples when helpful`;Generate a Response
The prompt is sent to Ollama's qwen2.5-coder:3b model for response generation:
const response = await fetch(`${OLLAMA_URL}/api/generate`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "qwen2.5-coder:3b",
prompt: prompt,
stream: false,
}),
});
const data = await response.json();
return data.response;Chat Models
| Model | Purpose | Size |
|---|---|---|
qwen2.5-coder:3b | Chat responses and code review | ~1.9 GB |
qwen2.5:7b | Repository overview and health analysis | ~4.4 GB |
The chat model (qwen2.5-coder:3b) is optimized for code understanding and generation. It is configured via the OLLAMA_CHAT_MODEL environment variable.
AI Code Review
CodePilot also uses the RAG pipeline for automated pull request reviews:
- A
pull_requestwebhook event triggers a review job - The PR diff is parsed to identify changed files
- Relevant existing code chunks are retrieved for context
- The LLM generates a code review based on the diff + surrounding context
- The review is posted as a GitHub review comment via the GitHub API
Automated Reviews
AI code reviews are triggered automatically when a pull request is opened or updated. The review includes suggestions for improvements, potential bugs, and style issues.
Repository Intelligence
Beyond chat, CodePilot generates structured intelligence:
Repository Overview
Uses qwen2.5:7b to generate:
- Project summary and purpose
- Architecture description
- Tech stack identification
- Main modules and entry points
- Design patterns used
Repository Health Score
AI-generated assessment including:
- Code quality strengths
- Identified issues and risks
- Scored recommendations for improvement
Example Queries
Here are some effective queries you can ask about any indexed repository:
- "How is authentication implemented in this project?"
- "What database models exist and how are they related?"
- "Show me the main API routes and their handlers"
- "What design patterns are used in the frontend?"
- "How does error handling work across the application?"
- "What are the main dependencies and what are they used for?"