File Parsing
How CodePilot uses ts-morph AST analysis to parse TypeScript and JavaScript files into structured code elements.
File parsing is the stage where raw source files are transformed into structured code elements that can be meaningfully chunked and embedded. CodePilot uses different parsing strategies depending on the file type.
Parsing Strategies
| File Type | Parser | Output |
|---|---|---|
TypeScript (.ts, .tsx) | ts-morph (AST) | Functions, classes, components, hooks, types, exports |
JavaScript (.js, .jsx) | ts-morph (AST) | Functions, classes, components, exports |
Markdown (.md, .mdx) | Section-based | Heading-delimited sections |
JSON (.json) | Full-file | Complete file content |
AST Parsing with ts-morph
For TypeScript and JavaScript files, CodePilot creates a ts-morph Project and analyzes the AST to extract meaningful code elements:
import { Project, SyntaxKind } from "ts-morph";
const project = new Project({
compilerOptions: {
allowJs: true,
jsx: 1, // JsxEmit.Preserve
},
});
const sourceFile = project.addSourceFileAtPath(filePath);Extracted Elements
Markdown Parsing
Markdown files are parsed using a section-based strategy. Content is split at heading boundaries:
# Introduction ← Section 1
Some content here...
## Installation ← Section 2
Step-by-step guide...
### Prerequisites ← Section 3
List of requirements...Each section becomes a separate chunk, preserving the heading hierarchy for context.
JSON Parsing
JSON files (like package.json, tsconfig.json) are captured as complete files since they are typically small and meaningful as whole units.
Fallback Parsing
Files that don't match any specific parser use a line-based fallback chunker that splits content into fixed-size segments with overlapping context windows. See Chunking Strategy for details.