English ยท ็ฎไฝไธญๆ
Language-neutral code index for AI agents โ precise navigation without full-project search.
CodeMap builds a deterministic, AST-based index of your codebase so AI agents (Claude Code, Cursor, Codex, etc.) can find call chains, route mappings, and cross-file relationships without grepping the entire project. Indexing is static, fast, and reproducible โ no LLM in the index path.
Status: 0.3.1 stable. Installable from PyPI as codemap-core
plus 17 codemap-<lang> plugins + 2 framework / output plugins
(codemap-mybatis, codemap-aimemory, added in 0.3.0; 0.3.1
adds the codemap llm config CLI).
๐ In a hurry? The
INSTALL.mdguide is the definitive walkthrough โ it coverspipx/uv tool/pip, language-plugin injection, offline distribution, troubleshooting, and a verbatim clean-machine validation log.
- Core principles
- Installation
- Verify
- Commands
- Configuration
- Built-in indexers and bridges
- Architecture
- Writing a plugin
- Performance
- Documentation
- Contributing
- License
- Static analysis first, LLM as consumer โ the index is deterministic and reproducible.
- Layered defense, confidence-graded โ admit uncertainty rather than hallucinate.
- Cross-asset bridging is the core value โ non-source assets (XML, YAML, IDL) bridge to code via the same protocol as languages.
- Evolvable path โ CLI โ MCP Server โ Agent CLI, each step independently valuable.
- Ecosystem-compatible โ SCIP for symbols, MCP for tools.
- Language-neutral โ no language or framework is privileged; all indexers and bridges register through the same plugin protocol (see ADR-L001).
# Recommended: pipx provides environment isolation + a system-wide
# `codemap` command
pipx install codemap-core
# Plain pip (preferably into a venv)
pip install codemap-core
# Or with uv
uv tool install codemap-core# `--watch` mode needs watchdog
pip install "codemap-core[watch]"
pipx install "codemap-core[watch]"
# Development tools (tests, lint, mypy, import-linter, benchmarks)
pip install "codemap-core[dev]"Each non-Python language indexer ships as an independent PyPI
distribution. To add a language to a pipx-installed codemap, use
pipx inject so the plugin lands in the same isolated venv as the
main CLI:
# All 17 languages in one shot
pipx inject codemap codemap-typescript codemap-javascript codemap-vue \
codemap-java codemap-jsp codemap-go \
codemap-rust codemap-swift codemap-kotlin \
codemap-ruby codemap-php codemap-sql \
codemap-bash codemap-c codemap-cpp \
codemap-csharp codemap-scalaPlain pip (when codemap-core is installed via pip, not pipx):
pip install codemap-typescript codemap-javascript codemap-vue \
codemap-java codemap-jsp codemap-go codemap-rust \
codemap-swift codemap-kotlin codemap-ruby codemap-php \
codemap-sql codemap-bash codemap-c codemap-cpp \
codemap-csharp codemap-scalaOr one at a time when you only need a single language:
pipx inject codemap codemap-typescript # or pip install codemap-typescriptEach plugin declares codemap-core as a dependency, so pip will pull
the engine if you don't already have it. After installation, codemap doctor lists every installed plugin alongside the built-in indexers on
identical terms โ see Writing a plugin for the
design.
git clone https://github.com/qxbyte/codemap.git
cd codemap
# Editable install with all dev tooling
pip install -e ".[dev,watch]"
# Optionally install language plugins in editable mode
pip install -e plugins/codemap-typescript
pip install -e plugins/codemap-java
pip install -e plugins/codemap-go
pip install -e plugins/codemap-rust
pip install -e plugins/codemap-swift
pip install -e plugins/codemap-kotlin
pip install -e plugins/codemap-ruby
pip install -e plugins/codemap-php
pip install -e plugins/codemap-sql
pip install -e plugins/codemap-bash
pip install -e plugins/codemap-c
pip install -e plugins/codemap-cpp
pip install -e plugins/codemap-csharp
pip install -e plugins/codemap-scalaFor users who want unreleased changes from main or to pin to a
specific commit, the git URL form still works:
# Track main
pip install git+https://github.com/qxbyte/codemap.git
pipx install git+https://github.com/qxbyte/codemap.git
# Pin to a commit
pip install git+https://github.com/qxbyte/codemap.git@2c3ed45
# A specific language plugin from a subdirectory
pip install "git+https://github.com/qxbyte/codemap.git#subdirectory=plugins/codemap-typescript"| Item | Requirement |
|---|---|
| Python | โฅ 3.11 (the project develops on 3.13) |
| OS | macOS / Linux (Windows may need polling fallback for --watch) |
| Network | Required at install time to fetch tree-sitter-typescript etc. |
codemap --version # โ 0.1.0
codemap --help # list global flags + subcommands
codemap doctor # show registered indexers, bridges, and `.codemap/` stateA successful install with the TypeScript plugin should look like:
$ codemap doctor
CodeMap 0.1.0
project_root: /your/path
Registered indexers
โ name โ version โ languages โ file_patterns โ
โ _example_lang โ 0.1.0 โ example โ *.example โ
โ python โ 0.1.0 โ python โ *.py, *.pyi โ
โ typescript โ 0.1.0 โ typescript โ *.ts, *.tsx โ
โ java โ 0.1.0 โ java โ *.java โ
โ go โ 0.1.0 โ go โ *.go โ
โ rust โ 0.1.0 โ rust โ *.rs โ
โ swift โ 0.1.0 โ swift โ *.swift โ
โ kotlin โ 0.1.0 โ kotlin โ *.kt, *.kts โ
โ ruby โ 0.1.0 โ ruby โ *.rb โ
โ php โ 0.1.0 โ php โ *.php โ
โ sql โ 0.1.0 โ sql โ *.sql, *.ddl โ
โ bash โ 0.1.0 โ bash โ *.sh, *.bash, *.bats โ
โ c โ 0.1.0 โ c โ *.c, *.h โ
โ cpp โ 0.1.0 โ cpp โ *.cpp, *.cc, *.cxx, *.hpp, *.hh, *.hxx โ
โ csharp โ 0.1.0 โ csharp โ *.cs, *.csx โ
โ scala โ 0.1.0 โ scala โ *.scala, *.sc โ
Registered bridges
โ name โ version โ requires โ
โ http_route โ 0.1.0 โ - โ
โ python_cross_module โ 0.1.0 โ - โ
Full reference: docs/cli.md.
# Index a project (writes .codemap/)
codemap index /path/to/project
codemap index . --rebuild # discard old index
codemap index . --incremental # re-parse only files whose sha256 changed
codemap index . --watch # stay running and re-index on changes
codemap index . --dry-run # report what would be indexed, no write
# Diagnose
codemap doctor # plugins + index health
codemap diagnostics --severity error # show recorded warnings / errors
codemap config show # merged effective configuration
# Query
codemap search login -n 5
codemap get '<symbol-id>'
codemap callers '<symbol-id>' --depth 2
codemap callees '<symbol-id>'
codemap trace --from '<id>' --depth 5
codemap trace --from '<id>' --to '<id>' # shortest path
codemap routes # HTTP routes from the http_route bridge
# Knowledge recall โ 0.3.5+ (codemap-aimemory plugin)
# Scans .ai-memory/knowledge/*.yml (written by specode-distill / task-swarm)
# and ranks by token overlap; returns top-K relevant knowledge.
# Designed to be called by specode at the start of the requirements phase.
codemap recall '<query>' # default top-k 5, yaml output
codemap recall '<query>' -p /abs/project -k 10 -o json # explicit project + json
codemap recall '<query>' -t rules,pitfalls # filter categories
codemap recall --from-spec requirements.md # 0.3.6+: use spec file as query
codemap recall '<query>' --with-content # 0.4.0+: include rule/pit/case core fields
# Every result carries `freshness_score`/`ranked_score`/`stale` since 0.4.0;
# fresher hits outrank stale ones at the same token score (180-day half-life + code-churn decay).
# With `codemap-semantic-index` plugin installed (P1-3, since v0.4.2), recall
# automatically does hybrid token+embedding ranking with RRF fusion.
# Semantic recall (requires opt-in `codemap-semantic-index` plugin, P1-3)
codemap embed install # interactive picker; downloads default Qwen3-Embedding-0.6B (1.2GB)
codemap embed # incremental embed of knowledge-base/*.md
codemap embed --rebuild # force full rebuild
codemap embed backend set --provider qwen --api-key sk-xxx # switch to cloud Qwen embedding
# Machine-readable output: all commands take --json
codemap --json callers '<symbol-id>'
# Optional LLM enrichment (codemap-aimemory plugin, 0.3.0+)
codemap llm config set api-key sk-xxx # persist to ~/.config/codemap/llm.yaml
codemap llm config set base-url https://api.deepseek.com/v1
codemap llm config set model deepseek-chat
codemap llm config show # masked-key view + value source
codemap enrich . # fills .ai-memory/enrichment/*.yml
codemap enrich . --dry-run # count fn/method symbols, no API callExit codes follow sysexits.h (ADR-005); see
docs/cli.md for the table.
codemap index produces two parallel directories at the project root:
<project>/
โโโ .codemap/ โ deterministic, machine-friendly index (queried by `codemap โฆ`)
โโโ .ai-memory/ โ four-layer-memory-model L1 layout (consumed by AI agents)
| File | Contents |
|---|---|
symbols.json |
All symbols keyed by SymbolID. Each entry: kind, language, file, range, signature, annotations, confidence, extra (per-language metadata: pending_calls, http_route, supertypes, imports, params, return_type, change_count_90d, โฆ). |
edges.json |
Directed relations: calls / extends / implements / overrides / references / routes_to / maps_to / imports / accesses_table. Each carries confidence โ {high, medium, low}. |
routes.json |
HTTP routes minted by the http_route bridge from extra["http_route"]. |
aliases.json |
Synthetic intermediate โ real symbol links (e.g. route โ handler). |
manifest.json |
Project root, codemap_version, registered indexers + bridges + their versions, per-file sha256 / mtime / language. |
diagnostics.json |
Indexer / bridge warnings collected during the run (severity + code + message + producer). |
.lock |
Cross-process write lock; do not edit. |
Written by codemap-aimemory (L0+L1, every codemap index) and
optionally by sibling tools
(specode-distill for L2/L3, task-swarm for the auto-ingested
cases + pitfalls). AI agents read this tree directly. Stable
entity_id slugs are derived from the SCIP SymbolID
(e.g. fn-calcPrice / cls-OrderService / tbl-sf_coupon).
.ai-memory/
โโโ project.yml โ L0 (codemap-aimemory 0.3.2+)
โ tech stack / dependencies / git remote /
โ top dirs / configs โ best-effort autodetect
โ
โโโ entities/ โ L1 (codemap-aimemory 0.3.0+)
โ โโโ functions.yml fn-/cls- entities with calls / called_by /
โ โ related_tables / signature / line_range /
โ โ confidence / change_count_90d /
โ โ business_meaning
โ โโโ tables.yml tbl-* table entities
โ โโโ files.yml file-* file entries
โ โโโ modules.yml mod-* per-file aggregates (0.3.3+):
โ {id, path, language, fn_count, cls_count,
โ functions[], classes[]}
โ
โโโ relations/ โ L1
โ โโโ call-graph.yml `{from, to, type=calls, confidence}`
โ โโโ table-relations.yml `{from, to, type=accesses_table, confidence}`
โ โโโ rule-constraints.yml empty placeholder (L2 owns the channel)
โ
โโโ enrichment/ โ L1 OPTIONAL โ LLM-generated overlays
โ โโโ <sha1[:12]>.yml `{symbol_id, business_meaning,
โ related_rules, confidence:"llm",
โ source_model, generated_at}`
โ
โโโ _global/ โ L1โL2/L3 lookup (codemap-aimemory 0.3.4+)
โ โโโ entities.yml Cross-walk: every entity_id (code or
โ knowledge) with `source` โ
โ {code, knowledge, both} +
โ `knowledge_refs` (which knowledge yml
โ mention this entity). Backs `codemap recall`.
โ
โโโ _semantic/ โ P1-3, OPTIONAL โ written by codemap-semantic-index
โ โโโ chunks.json chunked text + metadata (model-independent)
โ โโโ vectors.npy (n_chunks, 1024) float32 (model-specific)
โ โโโ model_id.txt active backend fingerprint
โ โโโ manifest.json text_hash โ chunk_id (drives incremental embed)
โ
โโโ knowledge/ โ L2 + L3 (NOT written by codemap itself โ
produced by specode-distill / task-swarm;
codemap-aimemory reads it to build
_global/entities.yml and to power recall)
โโโ rules/ rule-*.yml L2 business rules / mechanisms
โโโ business/ biz-*.yml L2 business processes / UI features
โโโ modules/ mod-*.yml L2 module maps (table / call_chain)
โโโ cases/ case-*.yml L3 historical implementation cases
โโโ pitfalls/ pit-*.yml L3 reusable failure / fix lessons
Two-hop fan-out: when a Java method maps_to a sql_mapping that
accesses_table T, T automatically lands on the method's
related_tables. So fn-selectByUser.related_tables = [tbl-sf_coupon]
without the agent needing to follow the chain itself.
codemap-aimemory owns L0+L1; L2+L3 (knowledge/) come from sibling
tools in the pluginhub family.
The integration is one-way and loose โ codemap doesn't import the
others, just reads their yml output when present:
| Layer | Writer | When |
|---|---|---|
L0 project.yml |
codemap-aimemory (this) |
every codemap index |
L1 entities/*, relations/*, enrichment/* |
codemap-aimemory (this) |
every codemap index (enrichment is opt-in via codemap enrich) |
L1โL2/L3 _global/entities.yml |
codemap-aimemory (this) |
every codemap index, mining knowledge/*.yml if present |
L1.5 _semantic/* (chunks + vectors) |
codemap-semantic-index (opt-in plugin, P1-3) |
explicit codemap embed |
L2/L3 knowledge/rules,business,modules,cases,pitfalls/*.yml |
specode-distill (pluginhub plugin, specode 3.0+) |
user runs /specode:specode-distill <slug> or accepts the prompt at end of specode's acceptance phase |
L3 knowledge/cases/case-*.yml + knowledge/pitfalls/pit-*.yml |
task-swarm (pluginhub plugin, 0.6+) |
every successful task_swarm.py resolve |
When codemap-semantic-index is installed, codemap recall automatically becomes hybrid (token + embedding) ranking via Reciprocal Rank Fusion (k=60), then multiplied by freshness_score. Embedding hits that token recall missed surface naturally. Without the plugin installed, recall remains token-only โ no behaviour change for users who don't want embeddings.
Each specode-distill / task-swarm write also produces a twin
markdown file under <project_root>/knowledge-base/<category>/<id>.md
(same stem as the yml). The twin md preserves narrative / ascii flow
charts / wikilink-style tables that field-level yml necessarily
flattens, and is intended as the high-quality slicing source for a
future embedding indexer. codemap itself doesn't read knowledge-base/
today โ codemap recall operates on the yml side; the md exists to
serve human reading and future P1-3 semantic search.
Use codemap recall '<query>' to query the union (code-side entity
hits + token overlap against every knowledge/*.yml). This is what
specode 2.1+ calls from its requirements phase to inject "ๅทฒ็ฅ็บฆๆ /
ๅๅฒๅ" context before drafting a new spec. See docs/integration.md
(coming) for the full agent-side workflow.
No part of knowledge/ is required for codemap to function. On a
project that has never run specode-distill or task-swarm,
_global/entities.yml simply lists code entities with
source: code, and codemap recall returns matched code entities
with empty knowledge: [].
The core index is always LLM-free โ codemap index never calls any
LLM. Only the optional codemap enrich command in codemap-aimemory
writes the enrichment/ overlay, and only when you invoke it. The
existence of an API key is the on/off switch: without one, codemap enrich exits with a clear error and no network call is made.
Three configuration sources, first non-empty wins:
- CLI flag โ
--api-key,--base-url,--model,--backend - Environment variable โ
CODEMAP_LLM_API_KEY(alsoANTHROPIC_API_KEY,OPENAI_API_KEY);CODEMAP_LLM_BASE_URL(alsoOPENAI_BASE_URL,ANTHROPIC_BASE_URL);CODEMAP_LLM_MODEL;CODEMAP_LLM_BACKEND - Persistent file config โ
~/.config/codemap/llm.yaml(managed bycodemap llm config set/unset/show; writtenchmod 600) - Built-in defaults โ backend
openai, modelgpt-4o-mini
| Provider | Model example | Base URL |
|---|---|---|
| OpenAI | gpt-4o-mini |
https://api.openai.com/v1 (default) |
| DeepSeek | deepseek-chat |
https://api.deepseek.com/v1 |
| ๆบ่ฐฑ GLM | glm-4-flash |
https://open.bigmodel.cn/api/paas/v4/ |
| MiniMax | abab6.5s-chat |
https://api.minimax.chat/v1 |
| ๆไนๆ้ข Kimi | moonshot-v1-8k |
https://api.moonshot.cn/v1 |
| ้ฟ้้ไน | qwen-plus |
https://dashscope.aliyuncs.com/compatible-mode/v1 |
| ๅฐ็ฑณ MiMo | mimo-large |
(per vendor docs; OpenAI-compatible) |
| Ollama (local) | llama3 |
http://localhost:11434/v1 โ use --backend ollama (key not needed) |
| Anthropic native | claude-sonnet-4-5 |
(use --backend anthropic; requires anthropic SDK via pip install codemap-aimemory[llm]) |
Example with DeepSeek:
codemap llm config set base-url https://api.deepseek.com/v1
codemap llm config set api-key sk-xxx
codemap llm config set model deepseek-chat
codemap enrich .Project-level configuration lives at .codemap/config.yaml (committed
or git-ignored โ your choice). A user-level override at
~/.config/codemap/config.yaml is layered on top of built-in defaults,
and the project file is layered on top of that. CLI flags win over all
three.
# .codemap/config.yaml
storage:
backend: json # json | sqlite (sqlite reserved for a future sprint)
index:
ignore: [] # extra fnmatch patterns on names + project-relative paths
max_file_bytes: 10485760
follow_symlinks: false
indexers:
enabled: all # "all" or an explicit list of indexer names
disabled: [] # subtractive
bridges:
enabled: all
disabled: []Full reference: docs/configuration.md.
Run codemap config show to inspect the merged result and see which
file contributed each value.
| Indexer | Files | Provided by | Status |
|---|---|---|---|
python |
*.py, *.pyi |
main repo | first-class, dogfooded |
typescript |
*.ts, *.tsx |
plugins/codemap-typescript/ |
independent plugin |
java |
*.java |
plugins/codemap-java/ |
independent plugin |
go |
*.go |
plugins/codemap-go/ |
independent plugin |
rust |
*.rs |
plugins/codemap-rust/ |
independent plugin |
swift |
*.swift |
plugins/codemap-swift/ |
independent plugin |
kotlin |
*.kt, *.kts |
plugins/codemap-kotlin/ |
independent plugin |
ruby |
*.rb |
plugins/codemap-ruby/ |
independent plugin |
php |
*.php |
plugins/codemap-php/ |
independent plugin |
sql |
*.sql, *.ddl |
plugins/codemap-sql/ |
independent plugin (DDL only) |
bash |
*.sh, *.bash, *.bats |
plugins/codemap-bash/ |
independent plugin |
c |
*.c, *.h |
plugins/codemap-c/ |
independent plugin |
cpp |
*.cpp, *.cc, *.cxx, *.hpp, *.hh, *.hxx |
plugins/codemap-cpp/ |
independent plugin |
csharp |
*.cs, *.csx |
plugins/codemap-csharp/ |
independent plugin |
scala |
*.scala, *.sc |
plugins/codemap-scala/ |
independent plugin |
_example_lang |
*.example |
main repo | reference / smoke |
| Bridge | Purpose |
|---|---|
http_route |
Mints scip-route intermediates from Symbol.extra["http_route"] and ["http_calls"] metadata; links client callers to server handlers regardless of language |
python_cross_module |
Resolves synthetic scip-python . . . <module>/<leaf>. targets emitted by the Python indexer to concrete local symbols when the file is in the index |
New language? You never need to PR the main repository โ see Writing a plugin.
cli โ core โ indexers
โ โ
โโโ io โโโโโ
โ
mcp
- core โ pure business logic, Pydantic data models, SymbolID (SCIP
format), call-graph algorithms (
walk_chain,shortest_path) - io โ persistence adapters (JSON today, SQLite reserved for scale)
- indexers โ pluggable language/asset indexers, discovered via
codemap.indexersentry-point group - bridges โ pluggable cross-language resolvers, discovered via
codemap.bridgesentry-point group - cli โ Typer command surface
- mcp โ MCP server, later sprint
Strict import-linter contracts (pyproject.toml) enforce the
dependency direction cli โ core โ indexers, cli โ core โ io on
every PR.
CodeMap's indexers and bridges are plugin-first. Adding a new language is
a separate PyPI package โ main repo is never touched. The
codemap-typescript package under plugins/ is the reference
implementation:
# your-plugin/pyproject.toml
[project.entry-points."codemap.indexers"]
yourlang = "codemap_yourlang:YourLangIndexer"That one line is the only coupling. After pip install your-plugin
your indexer appears in codemap doctor on identical terms.
Step-by-step guide: docs/plugin-guide.md.
Reference: plugins/codemap-typescript/.
Baseline numbers (median, M-series single core, indexing the CodeMap repo itself, 437 symbols / 1232 edges):
| Bench | Median | Target (design ยง21) |
|---|---|---|
| full index | 73 ms | โค 3 s |
callers |
4.7 ยตs | โค 50 ms |
callees |
26 ยตs | โค 50 ms |
walk_chain depth 10 |
72 ยตs | โค 200 ms |
Re-run locally with pytest -m bench -o addopts="". PRs that regress
any median by โฅ 20 % are blocked by CI (ADR-010). Full table and
methodology: docs/performance.md.
| File | Topic |
|---|---|
docs/cli.md |
Every command, flag, JSON envelope, exit code |
docs/configuration.md |
All config keys + merge order |
docs/plugin-guide.md |
How to write an indexer / bridge plugin |
docs/performance.md |
Baseline numbers + ADR-010 regression policy |
docs/indexers/python.md |
Python indexer details |
docs/bridges/http_route.md |
HTTP route bridge contract |
docs/adr/ |
Architecture decision records (1โ12 + L001) |
CHANGELOG.md |
Release notes |
See CONTRIBUTING.md. The key invariant: no
language is a first-class citizen. Proposals that special-case any
ecosystem will be asked to refactor into the generic plugin protocol
(ADR-L001).
CI gates every PR through ruff, mypy --strict, import-linter,
pytest --cov 80%, and the benchmark suite.
MIT โ see LICENSE.