Codebase intelligence as an MCP server. Native Rust. Sub-second indexing. Zero runtime dependencies.
CodeGraph builds a complete semantic graph of your codebase — every function, class, import, and call relationship across 32 programming languages — and makes it instantly available to AI coding agents through the Model Context Protocol (MCP).
When Claude Code, Codex, or any MCP-compatible agent enters your project, CodeGraph gives it an immediate, deep understanding of your entire codebase: what calls what, what depends on what, what breaks if you change something. Not file-level grep — graph-aware, semantically-ranked, token-budgeted context.
v0.2.5 adds git integration, security scanning (OWASP/CWE), call graph analysis, data flow analysis, a YAML configuration system, structured logging, and robust multi-byte character handling.
Two steps. That's it.
Pick one:
curl -fsSL https://raw.githubusercontent.com/suatkocar/codegraph/main/install.sh | bashbrew tap suatkocar/codegraph && brew install codegraphnpx @suatkocar/codegraph initcargo build --release --no-default-featurescodegraph initThis single command indexes your codebase, registers the MCP server, installs lifecycle hooks, generates CLAUDE.md, and sets up git hooks.
Open Claude Code and your project is already graph-aware.
The SessionStart hook triggers an incremental re-index (~12ms when nothing changed). Your graph is always fresh.
The UserPromptSubmit hook searches the graph for context relevant to your message and injects it automatically. The agent sees the right code before it even starts thinking.
The PreToolUse hook injects relevant codebase context before Edit, Write, Read, Grep, Glob, and Bash calls. The agent knows which symbols are in the file before it even opens it.
The PostToolUse hook re-indexes the modified file instantly. The graph stays in sync with changes as they happen.
The PostToolUseFailure hook provides corrective context — if a function was renamed or moved, CodeGraph tells the agent where to find it now.
The SubagentStart hook injects a compact project overview (top symbols, frameworks, file counts) into every subagent. They start with full codebase awareness from turn one.
The PreCompact hook saves a PageRank summary of the most important symbols. Structural awareness survives even when conversation history gets compressed.
The Stop hook checks for graph inconsistencies (high unresolved reference ratio) and can suggest the agent continue to address them before stopping.
The TaskCompleted hook runs a quality gate — re-indexes the project and reports any new dead code or unresolved references introduced during the task.
The SessionEnd hook runs a final incremental re-index and logs session-level diagnostics (file count, node count, edge count, unresolved refs).
| Metric | Without CodeGraph | With CodeGraph | Improvement |
|---|---|---|---|
| Tokens per task | ~5,000 | ~1,550 | 68% fewer |
| Files examined | 11 (all) | 3 avg | 73% fewer |
| Index 54 files | N/A | 230ms | Sub-second |
| Incremental no-op | N/A | 13ms | Instant |
| CPU utilization | N/A | 600%+ | Parallel parsing via rayon |
Measured on a real 11-file TypeScript project with ground truth evaluation:
| Task Query | Baseline Tokens | CodeGraph Tokens | Reduction |
|---|---|---|---|
| Authentication login | 4,962 | 1,997 | 59.8% |
| Database connection | 4,962 | 1,305 | 73.7% |
| User repository | 4,962 | 2,108 | 57.5% |
| API routes handlers | 4,962 | 935 | 81.2% |
| Password hashing | 4,962 | 1,522 | 69.3% |
| Average | 68.3% |
| Category | Precision | Recall | F1 Score |
|---|---|---|---|
| Caller detection | 1.00 | 1.00 | 1.00 |
| Dead code detection | 0.75 | 1.00 | 0.86 |
| Search relevance | 0.27 | 0.58 | 0.37 |
Caller detection achieves perfect precision and recall — CodeGraph never misses a caller and never hallucinates one.
| Language | Extensions | Language | Extensions |
|---|---|---|---|
| TypeScript | .ts |
Haskell | .hs .lhs |
| TSX | .tsx |
Elixir | .ex .exs |
| JavaScript | .js .mjs .cjs |
Groovy | .groovy .gradle |
| JSX | .jsx |
PowerShell | .ps1 .psm1 |
| Python | .py |
Perl | .pl .pm |
| Go | .go |
Clojure | .clj .cljs |
| Rust | .rs |
Julia | .jl |
| Java | .java |
R | .R .r |
| C | .c .h |
Erlang | .erl .hrl |
| C++ | .cpp .cc .hpp |
Elm | .elm |
| C# | .cs |
Fortran | .f90 .f95 |
| PHP | .php |
Nix | .nix |
| Ruby | .rb |
Bash | .sh .bash |
| Swift | .swift |
Scala | .scala .sc |
| Kotlin | .kt .kts |
Dart | .dart |
| Verilog | .v .sv |
Zig | .zig |
| Lua | .lua |
All grammars are statically linked at compile time via native tree-sitter 0.25. No WASM, no runtime downloads, no initialization delay.
| Tool | Purpose |
|---|---|
codegraph_query |
Hybrid keyword + semantic search (FTS5 + sqlite-vec + RRF) |
codegraph_dependencies |
Forward dependency traversal (recursive CTEs) |
codegraph_callers |
Reverse call graph |
codegraph_callees |
Forward call graph |
codegraph_impact |
Blast radius analysis with risk classification |
codegraph_structure |
Project overview with PageRank-ranked symbols |
codegraph_tests |
Test coverage discovery |
codegraph_context |
LLM context assembly (4-tier token budget) |
codegraph_node |
Direct symbol lookup with relationships |
codegraph_diagram |
Mermaid diagram generation |
codegraph_dead_code |
Find unused symbols |
codegraph_frameworks |
Detect project frameworks (18+) |
codegraph_languages |
Language breakdown statistics |
| Tool | Purpose |
|---|---|
codegraph_blame |
Line-by-line git blame |
codegraph_file_history |
File commit history |
codegraph_recent_changes |
Recent repository commits |
codegraph_commit_diff |
Commit diff details |
codegraph_symbol_history |
Symbol modification history |
codegraph_branch_info |
Branch status and tracking info |
codegraph_modified_files |
Working tree changes (staged/unstaged) |
codegraph_hotspots |
Churn-based hotspot detection |
codegraph_contributors |
Contributor statistics |
| Tool | Purpose |
|---|---|
codegraph_scan_security |
YAML rule-based vulnerability scan |
codegraph_check_owasp |
OWASP Top 10 2021 scan |
codegraph_check_cwe |
CWE Top 25 scan |
codegraph_explain_vulnerability |
CWE explanation + remediation guidance |
codegraph_suggest_fix |
Fix suggestion for findings |
codegraph_find_injections |
SQL/XSS/command injection via taint analysis |
codegraph_taint_sources |
Identify taint sources in code |
codegraph_security_summary |
Comprehensive risk assessment |
codegraph_trace_taint |
Data flow tracing from source to sink |
| Tool | Purpose |
|---|---|
codegraph_stats |
Index statistics (nodes, edges, files) |
codegraph_circular_imports |
Cycle detection (Tarjan SCC) |
codegraph_project_tree |
Directory tree with symbol counts |
codegraph_find_references |
Cross-reference search |
codegraph_export_map |
Module export listing |
codegraph_import_graph |
Import graph visualization |
codegraph_file |
File symbol listing |
| Tool | Purpose |
|---|---|
codegraph_find_path |
Shortest call path between two functions (BFS) |
codegraph_complexity |
Cyclomatic + cognitive complexity per function |
codegraph_data_flow |
Variable def-use chains |
codegraph_dead_stores |
Assignments never read |
codegraph_find_uninitialized |
Variables used before initialization |
codegraph_reaching_defs |
Reaching definition analysis |
CodeGraph includes a built-in security scanner with YAML-based rules:
- 4 bundled rule sets: OWASP Top 10, CWE Top 25, cryptographic weaknesses, secret detection
- 50+ rules covering SQL injection, XSS, command injection, hardcoded secrets, weak crypto, and more
- Taint analysis: Source-to-sink data flow tracking for injection vulnerabilities
- Custom rules: Write your own YAML rules with regex patterns, severity, CWE/OWASP mappings
codegraph scan-security # Full scan with all rules
codegraph check-owasp # OWASP Top 10 only
codegraph check-cwe # CWE Top 25 onlyCodeGraph supports layered YAML configuration:
# .codegraph.yaml (project-level) or ~/.config/codegraph/config.yaml (user-level)
version: "1.0"
preset: balanced # minimal | balanced | full | security-focused
tools:
overrides:
codegraph_dead_code:
enabled: false
reason: "Too noisy for this project"
performance:
exclude_tests: true4 presets: minimal (15 tools), balanced (30 tools), full (all 44), security-focused
Auto editor detection: Claude Code → full, VS Code → balanced, Zed → minimal
Environment overrides: CODEGRAPH_PRESET, CODEGRAPH_DISABLED_TOOLS
Source Files ──→ tree-sitter ──→ Extractor ──→ SQLite DB
(32 langs) (native parse) (nodes+edges) ├── FTS5 (keyword index)
├── sqlite-vec (vector index)
└── edges (graph structure)
↓
Query Engine
├── Hybrid Search (BM25 + cosine + RRF)
├── Graph Traversal (recursive CTEs)
├── PageRank (symbol importance)
└── Context Assembly (4-tier budget)
↓
MCP Server (stdio)
├── 44 tools via rmcp
└── 10 Claude Code hooks
src/
main.rs CLI entry point (16 commands, clap derive)
mcp/server.rs MCP server — 44 tools via rmcp #[tool] macros
db/schema.rs SQLite schema — FTS5 + sqlite-vec + unresolved_refs
indexer/
parser.rs 32 tree-sitter grammars, statically linked
extractor.rs AST → nodes, edges, qualified names for all languages
pipeline.rs Parallel indexing with rayon + incremental SHA-256 hashing
embedder.rs Jina v2 Base Code embeddings (768-dim, ONNX)
graph/
store.rs CRUD operations with prepare_cached
traversal.rs Dependency/caller/callee traversal via recursive CTEs
ranking.rs PageRank, personalized PageRank, blast radius
search.rs Hybrid FTS5 + vector search, RRF fusion (k=60)
complexity.rs Cyclomatic + cognitive complexity analysis
dataflow.rs Def-use chains, reaching definitions, dead stores
context/
assembler.rs 4-tier token-budgeted LLM context assembly
budget.rs Token estimation, truncation, signature extraction
resolution/
imports.rs Cross-file import resolution + path alias support
routes.rs Framework-specific route/component resolvers
frameworks.rs Framework detection (18+ frameworks from manifests)
dead_code.rs Unused symbol detection via edge analysis
git/
blame.rs Git blame integration
history.rs File/symbol history, commit diffs
analysis.rs Hotspots, contributors, branch info
security/
scanner.rs Directory/file scanning engine
rules.rs YAML rule parser + bundled rule loader
taint.rs Source-to-sink taint analysis
config/
schema.rs Configuration data model + validation
loader.rs Layered config loading (user → project → env → CLI)
preset.rs 4 presets with tool/category filtering
observability/
mod.rs Structured logging (tracing), path validation, secret redaction
eval/
harness.rs Evaluation framework (precision/recall/F1)
token_benchmark.rs Token reduction measurement vs baseline
cli/
installer.rs Interactive installer with ASCII banner + progress bars
hooks/
install.rs .mcp.json + .claude/settings.json + shell scripts
handlers.rs 10 runtime handlers with catch_unwind safety
git_hooks.rs Git post-commit hook (idempotent, marker-based)
claude_template.rs CLAUDE.md generation with tool instructions
codegraph init <dir> Full interactive setup (banner, prompts, progress bars)
codegraph init <dir> --yes Non-interactive setup (CI/scripting)
codegraph index <dir> Index a codebase (incremental by default)
codegraph index <dir> --force Force full re-index
codegraph serve Start MCP server (stdio transport)
codegraph query <text> Search the code graph
codegraph impact <target> Blast radius analysis
codegraph stats Show index statistics
codegraph dead-code Find potentially unused symbols
codegraph frameworks <dir> Detect frameworks and libraries
codegraph languages Language breakdown
codegraph install-hooks <dir> Install Claude Code hooks
codegraph git-hooks install Install git post-commit hook
codegraph git-hooks uninstall Remove git post-commit hook
# Prerequisites: Rust 1.75+, C compiler (for tree-sitter + SQLite)
# Full build with embeddings
cargo build --release
# Without embeddings (keyword-only search, leaner binary)
cargo build --release --no-default-features
# Run the test suite (2065 tests)
cargo test# Remove the binary
rm ~/.local/bin/codegraph
# Remove project data (run inside each project you initialized)
rm -rf .codegraph/
codegraph git-hooks uninstall
# Remove Claude Code integration files (optional, check before deleting)
rm .mcp.json
rm -rf .claude/- Sync core, async only at the MCP boundary. tree-sitter and rusqlite are synchronous. Tokio is used only for the rmcp stdio transport.
- Native tree-sitter, not WASM. 32 grammars statically linked. No initialization delay, no runtime downloads.
- Code-specific embeddings. Jina v2 Base Code (768-dim) understands programming language semantics, not just natural language.
- Feature-gated embeddings. Build with
--no-default-featuresfor a leaner binary that does keyword-only search. - Hooks never panic. Every handler uses
catch_unwindand always returns valid JSON. CodeGraph never blocks your agent. - Idempotent everything. Running
inittwice produces the same result. Hooks are marker-based. Config merges are additive. - Qualified names by containment.
Class.methodnames are derived from line-range enclosure — works across all 32 languages without language-specific logic. - Structured logging. Uses the
tracingcrate withRUST_LOGsupport. Path traversal protection and secret redaction on all MCP tool inputs/outputs.
MIT
Built with native Rust, tree-sitter, and a deep belief that AI coding agents deserve better context than grep.
