CodePilot
AI Pipeline

Repository Chat

How CodePilot uses Retrieval-Augmented Generation (RAG) to answer questions about codebases.

Repository Chat is the primary interface for interacting with an indexed codebase. It uses Retrieval-Augmented Generation (RAG) to provide accurate, context-aware answers grounded in the actual source code.

How RAG Works

RAG (Retrieval-Augmented Generation) combines information retrieval with language model generation:

Embed the Query

The user's question is converted to a 768-dimensional vector using the same nomic-embed-text model used during ingestion.

const queryEmbedding = await generateEmbedding(userQuestion);

Retrieve Relevant Code

pgvector performs cosine similarity search to find the most relevant code chunks:

SELECT cc."content", cc."filePath", cc."startLine", cc."endLine"
FROM "CodeEmbedding" ce
JOIN "CodeChunk" cc ON cc."id" = ce."chunkId"
WHERE ce."repositoryId" = $1
ORDER BY ce."embedding" <=> $2::vector
LIMIT 10;

The top-10 most similar chunks are retrieved, providing the LLM with relevant context.

Build the Prompt

Retrieved chunks are composed into a structured prompt:

const prompt = `You are an AI code assistant. Answer the user's question
based on the following code context from their repository.

## Code Context

${retrievedChunks.map(chunk =>
  `### ${chunk.filePath} (lines ${chunk.startLine}-${chunk.endLine})
\`\`\`
${chunk.content}
\`\`\``
).join("\n\n")}

## Question

${userQuestion}

## Instructions

- Answer based on the provided code context
- Reference specific files and line numbers
- If the context doesn't contain enough information, say so
- Provide code examples when helpful`;

Generate a Response

The prompt is sent to Ollama's qwen2.5-coder:3b model for response generation:

const response = await fetch(`${OLLAMA_URL}/api/generate`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "qwen2.5-coder:3b",
    prompt: prompt,
    stream: false,
  }),
});

const data = await response.json();
return data.response;

Chat Models

ModelPurposeSize
qwen2.5-coder:3bChat responses and code review~1.9 GB
qwen2.5:7bRepository overview and health analysis~4.4 GB

The chat model (qwen2.5-coder:3b) is optimized for code understanding and generation. It is configured via the OLLAMA_CHAT_MODEL environment variable.

AI Code Review

CodePilot also uses the RAG pipeline for automated pull request reviews:

  1. A pull_request webhook event triggers a review job
  2. The PR diff is parsed to identify changed files
  3. Relevant existing code chunks are retrieved for context
  4. The LLM generates a code review based on the diff + surrounding context
  5. The review is posted as a GitHub review comment via the GitHub API

Automated Reviews

AI code reviews are triggered automatically when a pull request is opened or updated. The review includes suggestions for improvements, potential bugs, and style issues.

Repository Intelligence

Beyond chat, CodePilot generates structured intelligence:

Repository Overview

Uses qwen2.5:7b to generate:

  • Project summary and purpose
  • Architecture description
  • Tech stack identification
  • Main modules and entry points
  • Design patterns used

Repository Health Score

AI-generated assessment including:

  • Code quality strengths
  • Identified issues and risks
  • Scored recommendations for improvement

Example Queries

Here are some effective queries you can ask about any indexed repository:

  • "How is authentication implemented in this project?"
  • "What database models exist and how are they related?"
  • "Show me the main API routes and their handlers"
  • "What design patterns are used in the frontend?"
  • "How does error handling work across the application?"
  • "What are the main dependencies and what are they used for?"

On this page