Repository Chat

How CodePilot uses Retrieval-Augmented Generation (RAG) to answer questions about codebases.

Repository Chat is the primary interface for interacting with an indexed codebase. It uses Retrieval-Augmented Generation (RAG) to provide accurate, context-aware answers grounded in the actual source code.

How RAG Works

RAG (Retrieval-Augmented Generation) combines information retrieval with language model generation:

Embed the Query

The user's question is converted to a 768-dimensional vector using the same nomic-embed-text model used during ingestion.

const queryEmbedding = await generateEmbedding(userQuestion);

Retrieve Relevant Code

pgvector performs cosine similarity search to find the most relevant code chunks:

SELECT cc."content", cc."filePath", cc."startLine", cc."endLine"
FROM "CodeEmbedding" ce
JOIN "CodeChunk" cc ON cc."id" = ce."chunkId"
WHERE ce."repositoryId" = $1
ORDER BY ce."embedding" <=> $2::vector
LIMIT 10;

The top-10 most similar chunks are retrieved, providing the LLM with relevant context.

Build the Prompt

Retrieved chunks are composed into a structured prompt:

const prompt = `You are an AI code assistant. Answer the user's question
based on the following code context from their repository.

## Code Context

${retrievedChunks.map(chunk =>
  `### ${chunk.filePath} (lines ${chunk.startLine}-${chunk.endLine})
\`\`\`
${chunk.content}
\`\`\``
).join("\n\n")}

## Question

${userQuestion}

## Instructions

- Answer based on the provided code context
- Reference specific files and line numbers
- If the context doesn't contain enough information, say so
- Provide code examples when helpful`;

Generate a Response

The prompt is sent to Ollama's qwen2.5-coder:3b model for response generation:

const response = await fetch(`${OLLAMA_URL}/api/generate`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "qwen2.5-coder:3b",
    prompt: prompt,
    stream: false,
  }),
});

const data = await response.json();
return data.response;

Chat Models

Model	Purpose	Size
`qwen2.5-coder:3b`	Chat responses and code review	~1.9 GB
`qwen2.5:7b`	Repository overview and health analysis	~4.4 GB

The chat model (qwen2.5-coder:3b) is optimized for code understanding and generation. It is configured via the OLLAMA_CHAT_MODEL environment variable.

AI Code Review

CodePilot also uses the RAG pipeline for automated pull request reviews:

A pull_request webhook event triggers a review job
The PR diff is parsed to identify changed files
Relevant existing code chunks are retrieved for context
The LLM generates a code review based on the diff + surrounding context
The review is posted as a GitHub review comment via the GitHub API

Automated Reviews

AI code reviews are triggered automatically when a pull request is opened or updated. The review includes suggestions for improvements, potential bugs, and style issues.

Repository Intelligence

Beyond chat, CodePilot generates structured intelligence:

Repository Overview

Uses qwen2.5:7b to generate:

Project summary and purpose
Architecture description
Tech stack identification
Main modules and entry points
Design patterns used

Repository Health Score

AI-generated assessment including:

Code quality strengths
Identified issues and risks
Scored recommendations for improvement

Example Queries

Here are some effective queries you can ask about any indexed repository:

"How is authentication implemented in this project?"
"What database models exist and how are they related?"
"Show me the main API routes and their handlers"
"What design patterns are used in the frontend?"
"How does error handling work across the application?"
"What are the main dependencies and what are they used for?"

On this page