CodePilot
Architecture

File Parsing

How CodePilot uses ts-morph AST analysis to parse TypeScript and JavaScript files into structured code elements.

File parsing is the stage where raw source files are transformed into structured code elements that can be meaningfully chunked and embedded. CodePilot uses different parsing strategies depending on the file type.

Parsing Strategies

File TypeParserOutput
TypeScript (.ts, .tsx)ts-morph (AST)Functions, classes, components, hooks, types, exports
JavaScript (.js, .jsx)ts-morph (AST)Functions, classes, components, exports
Markdown (.md, .mdx)Section-basedHeading-delimited sections
JSON (.json)Full-fileComplete file content

AST Parsing with ts-morph

For TypeScript and JavaScript files, CodePilot creates a ts-morph Project and analyzes the AST to extract meaningful code elements:

import { Project, SyntaxKind } from "ts-morph";

const project = new Project({
  compilerOptions: {
    allowJs: true,
    jsx: 1, // JsxEmit.Preserve
  },
});

const sourceFile = project.addSourceFileAtPath(filePath);

Extracted Elements

Markdown Parsing

Markdown files are parsed using a section-based strategy. Content is split at heading boundaries:

# Introduction          ← Section 1
Some content here...

## Installation         ← Section 2
Step-by-step guide...

### Prerequisites       ← Section 3
List of requirements...

Each section becomes a separate chunk, preserving the heading hierarchy for context.

JSON Parsing

JSON files (like package.json, tsconfig.json) are captured as complete files since they are typically small and meaningful as whole units.

Fallback Parsing

Files that don't match any specific parser use a line-based fallback chunker that splits content into fixed-size segments with overlapping context windows. See Chunking Strategy for details.

On this page