From c463dae2245dc97695e7a3dbbf3b61480457453e Mon Sep 17 00:00:00 2001 From: Satya K Date: Fri, 10 Apr 2026 23:22:47 +0530 Subject: [PATCH 1/8] feat(skill): add acquire-codebase-knowledge skill documentation --- skills/acquire-codebase-knowledge/SKILL.md | 149 +++++++++++++++++++++ 1 file changed, 149 insertions(+) create mode 100644 skills/acquire-codebase-knowledge/SKILL.md diff --git a/skills/acquire-codebase-knowledge/SKILL.md b/skills/acquire-codebase-knowledge/SKILL.md new file mode 100644 index 000000000..53b541540 --- /dev/null +++ b/skills/acquire-codebase-knowledge/SKILL.md @@ -0,0 +1,149 @@ +--- +name: acquire-codebase-knowledge +description: 'Use this skill when the user explicitly asks to map, document, or onboard into an existing codebase. Trigger for prompts like "map this codebase", "document this architecture", "onboard me to this repo", or "create codebase docs". Do not trigger for routine feature implementation, bug fixes, or narrow code edits unless the user asks for repository-level discovery.' +license: MIT +compatibility: 'Cross-platform. Preferred: run scripts/scan.sh with Bash. On Windows without Bash, run equivalent PowerShell discovery commands and keep the same output contract. Requires git and standard shell tooling.' +metadata: + version: "1.2" +argument-hint: 'Optional: specific area to focus on, e.g. "architecture only", "testing and concerns"' +--- + +# Acquire Codebase Knowledge + +Produces seven populated documents in `docs/codebase/` covering everything needed to work effectively on the project. Only document what is verifiable from files or terminal output — never infer or assume. + +## Output Contract (Required) + +Before finishing, all of the following must be true: + +1. Exactly these files exist in `docs/codebase/`: `STACK.md`, `STRUCTURE.md`, `ARCHITECTURE.md`, `CONVENTIONS.md`, `INTEGRATIONS.md`, `TESTING.md`, `CONCERNS.md`. +2. Every claim is traceable to source files, config, or terminal output. +3. Unknowns are marked as `[TODO]`; intent-dependent decisions are marked `[ASK USER]`. +4. Every document includes a short "evidence" list with concrete file paths. +5. Final response includes numbered `[ASK USER]` questions and intent-vs-reality divergences. + +## Workflow + +Copy and track this checklist: + +``` +- [ ] Phase 1: Run scan, read intent documents +- [ ] Phase 2: Investigate each documentation area +- [ ] Phase 3: Populate all seven docs in docs/codebase/ +- [ ] Phase 4: Validate docs, present findings, resolve all [ASK USER] items +``` + +## Focus Area Mode + +If the user supplies a focus area (for example: "architecture only" or "testing and concerns"): + +1. Always run Phase 1 in full. +2. Fully complete focus-area documents first. +3. For non-focus documents not yet analyzed, keep required sections present and mark unknowns as `[TODO]`. +4. Still run the Phase 4 validation loop on all seven documents before final output. + +### Phase 1: Scan and Read Intent + +1. Run from the project root: + ```bash + bash scripts/scan.sh --output docs/codebase/.codebase-scan.txt + ``` + Windows fallback (if Bash is unavailable): + ```powershell + Get-ChildItem -Recurse -File | Select-Object -First 200 FullName | Out-File docs/codebase/.codebase-scan.txt + ``` +2. Search for `PRD`, `TRD`, `README`, `ROADMAP`, `SPEC`, `DESIGN` files and read them. +3. Summarise the stated project intent before reading any source code. + +### Phase 2: Investigate + +Use the scan output to answer questions for each of the seven templates. Load [`references/inquiry-checkpoints.md`](references/inquiry-checkpoints.md) for the full per-template question list. + +If the stack is ambiguous (multiple manifest files, unfamiliar file types, no `package.json`), load [`references/stack-detection.md`](references/stack-detection.md). + +### Phase 3: Populate Templates + +Copy each template from `assets/templates/` into `docs/codebase/`. Fill in this order: + +1. [STACK.md](assets/templates/STACK.md) — language, runtime, frameworks, all dependencies +2. [STRUCTURE.md](assets/templates/STRUCTURE.md) — directory layout, entry points, key files +3. [ARCHITECTURE.md](assets/templates/ARCHITECTURE.md) — layers, patterns, data flow +4. [CONVENTIONS.md](assets/templates/CONVENTIONS.md) — naming, formatting, error handling, imports +5. [INTEGRATIONS.md](assets/templates/INTEGRATIONS.md) — external APIs, databases, auth, monitoring +6. [TESTING.md](assets/templates/TESTING.md) — frameworks, file organization, mocking strategy +7. [CONCERNS.md](assets/templates/CONCERNS.md) — tech debt, bugs, security risks, perf bottlenecks + +Use `[TODO]` for anything that cannot be determined from code. Use `[ASK USER]` where the right answer requires team intent. + +### Phase 4: Validate, Repair, Verify + +Run this mandatory validation loop before finalizing: + +1. Validate each doc against `references/inquiry-checkpoints.md`. +2. For each non-trivial claim, confirm at least one evidence reference exists. +3. If any required section is missing or unsupported: + - Fix the document. + - Re-run validation. +4. Repeat until all seven docs pass. + +Then present a summary of all seven documents, list every `[ASK USER]` item as a numbered question, and highlight any Intent vs. Reality divergences from Phase 1. + +Validation pass criteria: + +- No unsupported claims. +- No empty required sections. +- Unknowns use `[TODO]` rather than assumptions. +- Team-intent gaps are explicitly marked `[ASK USER]`. + +--- + +## Gotchas + +**Monorepos:** Root `package.json` may have no source — check for `workspaces`, `packages/`, or `apps/` directories. Each workspace may have independent dependencies and conventions. Map each sub-package separately. + +**Outdated README:** README often describes intended architecture, not the current one. Cross-reference with actual file structure before treating any README claim as fact. + +**TypeScript path aliases:** `tsconfig.json` `paths` config means imports like `@/foo` don't map directly to the filesystem. Map aliases to real paths before documenting structure. + +**Generated/compiled output:** Never document patterns from `dist/`, `build/`, `generated/`, `.next/`, `out/`, or `__pycache__/`. These are artefacts — document source conventions only. + +**`.env.example` reveals required config:** Secrets are never committed. Read `.env.example`, `.env.template`, or `.env.sample` to discover required environment variables. + +**`devDependencies` ≠ production stack:** Only `dependencies` (or equivalent, e.g. `[tool.poetry.dependencies]`) runs in production. Document linters, formatters, and test frameworks separately as dev tooling. + +**Test TODOs ≠ production debt:** TODOs inside `test/`, `tests/`, `__tests__/`, or `spec/` are coverage gaps, not production technical debt. Separate them in `CONCERNS.md`. + +**High-churn files = fragile areas:** Files appearing most in recent git history have the highest modification rate and likely hidden complexity. Always note them in `CONCERNS.md`. + +--- + +## Anti-Patterns + +| ❌ Don't | ✅ Do instead | +|---------|--------------| +| "Uses Clean Architecture with Domain/Data layers." (when no such directories exist) | State only what directory structure actually shows. | +| "This is a Next.js project." (without checking `package.json`) | Check `dependencies` first. State what's actually there. | +| Guess the database from a variable name like `dbUrl` | Check manifest for `pg`, `mysql2`, `mongoose`, `prisma`, etc. | +| Document `dist/` or `build/` naming patterns as conventions | Source files only. | + +--- + +## Bundled Assets + +| Asset | When to load | +|-------|-------------| +| [`scripts/scan.sh`](scripts/scan.sh) | Phase 1 — run first, before reading any code | +| [`references/inquiry-checkpoints.md`](references/inquiry-checkpoints.md) | Phase 2 — load for per-template investigation questions | +| [`references/stack-detection.md`](references/stack-detection.md) | Phase 2 — only if stack is ambiguous | +| [`assets/templates/STACK.md`](assets/templates/STACK.md) | Phase 3 step 1 | +| [`assets/templates/STRUCTURE.md`](assets/templates/STRUCTURE.md) | Phase 3 step 2 | +| [`assets/templates/ARCHITECTURE.md`](assets/templates/ARCHITECTURE.md) | Phase 3 step 3 | +| [`assets/templates/CONVENTIONS.md`](assets/templates/CONVENTIONS.md) | Phase 3 step 4 | +| [`assets/templates/INTEGRATIONS.md`](assets/templates/INTEGRATIONS.md) | Phase 3 step 5 | +| [`assets/templates/TESTING.md`](assets/templates/TESTING.md) | Phase 3 step 6 | +| [`assets/templates/CONCERNS.md`](assets/templates/CONCERNS.md) | Phase 3 step 7 | + +Template usage mode: + +- Default mode: complete only the "Core Sections (Required)" in each template. +- Extended mode: add optional sections only when the repo complexity justifies them. From 00e11813e8fb60d7884b85b8e8a5b029416dffc9 Mon Sep 17 00:00:00 2001 From: Satya K Date: Fri, 10 Apr 2026 23:23:01 +0530 Subject: [PATCH 2/8] feat(templates): add architecture, concerns, conventions, integrations, stack, structure, and testing documentation templates --- .../assets/templates/ARCHITECTURE.md | 49 ++++++++++++++++ .../assets/templates/CONCERNS.md | 56 ++++++++++++++++++ .../assets/templates/CONVENTIONS.md | 52 +++++++++++++++++ .../assets/templates/INTEGRATIONS.md | 48 ++++++++++++++++ .../assets/templates/STACK.md | 56 ++++++++++++++++++ .../assets/templates/STRUCTURE.md | 44 ++++++++++++++ .../assets/templates/TESTING.md | 57 +++++++++++++++++++ 7 files changed, 362 insertions(+) create mode 100644 skills/acquire-codebase-knowledge/assets/templates/ARCHITECTURE.md create mode 100644 skills/acquire-codebase-knowledge/assets/templates/CONCERNS.md create mode 100644 skills/acquire-codebase-knowledge/assets/templates/CONVENTIONS.md create mode 100644 skills/acquire-codebase-knowledge/assets/templates/INTEGRATIONS.md create mode 100644 skills/acquire-codebase-knowledge/assets/templates/STACK.md create mode 100644 skills/acquire-codebase-knowledge/assets/templates/STRUCTURE.md create mode 100644 skills/acquire-codebase-knowledge/assets/templates/TESTING.md diff --git a/skills/acquire-codebase-knowledge/assets/templates/ARCHITECTURE.md b/skills/acquire-codebase-knowledge/assets/templates/ARCHITECTURE.md new file mode 100644 index 000000000..26f575e2b --- /dev/null +++ b/skills/acquire-codebase-knowledge/assets/templates/ARCHITECTURE.md @@ -0,0 +1,49 @@ +# Architecture + +## Core Sections (Required) + +### 1) Architectural Style + +- Primary style: [layered/feature/event-driven/other] +- Why this classification: [short evidence-backed rationale] +- Primary constraints: [2-3 constraints that shape design] + +### 2) System Flow + +```text +[entry] -> [processing] -> [domain logic] -> [data/integration] -> [response/output] +``` + +Describe the flow in 4-6 steps using file-backed evidence. + +### 3) Layer/Module Responsibilities + +| Layer or module | Owns | Must not own | Evidence | +|-----------------|------|--------------|----------| +| [name] | [responsibility] | [non-responsibility] | [file] | + +### 4) Reused Patterns + +| Pattern | Where found | Why it exists | +|---------|-------------|---------------| +| [singleton/repository/adapter/etc] | [path] | [reason] | + +### 5) Known Architectural Risks + +- [Risk 1 + impact] +- [Risk 2 + impact] + +### 6) Evidence + +- [path/to/entrypoint] +- [path/to/main-layer-files] +- [path/to/data-or-integration-layer] + +## Extended Sections (Optional) + +Add only when needed: + +- Startup or initialization order details +- Async/event topology diagrams +- Anti-pattern catalog with refactoring paths +- Failure-mode analysis and resilience posture diff --git a/skills/acquire-codebase-knowledge/assets/templates/CONCERNS.md b/skills/acquire-codebase-knowledge/assets/templates/CONCERNS.md new file mode 100644 index 000000000..d41e13ab3 --- /dev/null +++ b/skills/acquire-codebase-knowledge/assets/templates/CONCERNS.md @@ -0,0 +1,56 @@ +# Codebase Concerns + +## Core Sections (Required) + +### 1) Top Risks (Prioritized) + +| Severity | Concern | Evidence | Impact | Suggested action | +|----------|---------|----------|--------|------------------| +| [high/med/low] | [issue] | [file or scan output] | [impact] | [next action] | + +### 2) Technical Debt + +List the most important debt items only. + +| Debt item | Why it exists | Where | Risk if ignored | Suggested fix | +|-----------|---------------|-------|-----------------|---------------| +| [item] | [reason] | [path] | [risk] | [fix] | + +### 3) Security Concerns + +| Risk | OWASP category (if applicable) | Evidence | Current mitigation | Gap | +|------|--------------------------------|----------|--------------------|-----| +| [risk] | [A01/A03/etc or N/A] | [path] | [what exists] | [what is missing] | + +### 4) Performance and Scaling Concerns + +| Concern | Evidence | Current symptom | Scaling risk | Suggested improvement | +|---------|----------|-----------------|-------------|-----------------------| +| [issue] | [path/metric] | [symptom] | [risk] | [action] | + +### 5) Fragile/High-Churn Areas + +| Area | Why fragile | Churn signal | Safe change strategy | +|------|-------------|-------------|----------------------| +| [path] | [reason] | [recent churn evidence] | [approach] | + +### 6) `[ASK USER]` Questions + +Add unresolved intent-dependent questions as a numbered list. + +1. [ASK USER] [question] + +### 7) Evidence + +- [scan output section reference] +- [path/to/code-file] +- [path/to/config-or-history-evidence] + +## Extended Sections (Optional) + +Add only when needed: + +- Full bug inventory +- Component-level remediation roadmap +- Cost/effort estimates by concern +- Dependency-risk and ownership mapping diff --git a/skills/acquire-codebase-knowledge/assets/templates/CONVENTIONS.md b/skills/acquire-codebase-knowledge/assets/templates/CONVENTIONS.md new file mode 100644 index 000000000..5a29453cf --- /dev/null +++ b/skills/acquire-codebase-knowledge/assets/templates/CONVENTIONS.md @@ -0,0 +1,52 @@ +# Coding Conventions + +## Core Sections (Required) + +### 1) Naming Rules + +| Item | Rule | Example | Evidence | +|------|------|---------|----------| +| Files | [RULE] | [EXAMPLE] | [FILE] | +| Functions/methods | [RULE] | [EXAMPLE] | [FILE] | +| Types/interfaces | [RULE] | [EXAMPLE] | [FILE] | +| Constants/env vars | [RULE] | [EXAMPLE] | [FILE] | + +### 2) Formatting and Linting + +- Formatter: [TOOL + CONFIG FILE] +- Linter: [TOOL + CONFIG FILE] +- Most relevant enforced rules: [RULE_1], [RULE_2], [RULE_3] +- Run commands: [COMMANDS] + +### 3) Import and Module Conventions + +- Import grouping/order: [RULE] +- Alias vs relative import policy: [RULE] +- Public exports/barrel policy: [RULE] + +### 4) Error and Logging Conventions + +- Error strategy by layer: [SHORT SUMMARY] +- Logging style and required context fields: [SUMMARY] +- Sensitive-data redaction rules: [SUMMARY] + +### 5) Testing Conventions + +- Test file naming/location rule: [RULE] +- Mocking strategy norm: [RULE] +- Coverage expectation: [RULE or TODO] + +### 6) Evidence + +- [path/to/lint-config] +- [path/to/format-config] +- [path/to/representative-source-file] + +## Extended Sections (Optional) + +Add only for large or inconsistent codebases: + +- Layer-specific error handling matrix +- Language-specific strictness options +- Repo-specific commit/branching conventions +- Known convention violations to clean up diff --git a/skills/acquire-codebase-knowledge/assets/templates/INTEGRATIONS.md b/skills/acquire-codebase-knowledge/assets/templates/INTEGRATIONS.md new file mode 100644 index 000000000..f62039ff8 --- /dev/null +++ b/skills/acquire-codebase-knowledge/assets/templates/INTEGRATIONS.md @@ -0,0 +1,48 @@ +# External Integrations + +## Core Sections (Required) + +### 1) Integration Inventory + +| System | Type (API/DB/Queue/etc) | Purpose | Auth model | Criticality | Evidence | +|--------|---------------------------|---------|------------|-------------|----------| +| [name] | [type] | [purpose] | [auth] | [high/med/low] | [file] | + +### 2) Data Stores + +| Store | Role | Access layer | Key risk | Evidence | +|-------|------|--------------|----------|----------| +| [db/cache/etc] | [role] | [module] | [risk] | [file] | + +### 3) Secrets and Credentials Handling + +- Credential sources: [env/secrets manager/config] +- Hardcoding checks: [result] +- Rotation or lifecycle notes: [known/unknown] + +### 4) Reliability and Failure Behavior + +- Retry/backoff behavior: [implemented/none/partial] +- Timeout policy: [where configured] +- Circuit-breaker or fallback behavior: [if any] + +### 5) Observability for Integrations + +- Logging around external calls: [yes/no + where] +- Metrics/tracing coverage: [yes/no + where] +- Missing visibility gaps: [list] + +### 6) Evidence + +- [path/to/integration-wrapper] +- [path/to/config-or-env-template] +- [path/to/monitoring-or-logging-config] + +## Extended Sections (Optional) + +Add only when needed: + +- Endpoint-by-endpoint catalog +- Auth flow sequence diagrams +- SLA/SLO per integration +- Region/failover topology notes diff --git a/skills/acquire-codebase-knowledge/assets/templates/STACK.md b/skills/acquire-codebase-knowledge/assets/templates/STACK.md new file mode 100644 index 000000000..2520677c3 --- /dev/null +++ b/skills/acquire-codebase-knowledge/assets/templates/STACK.md @@ -0,0 +1,56 @@ +# Technology Stack + +## Core Sections (Required) + +### 1) Runtime Summary + +| Area | Value | Evidence | +|------|-------|----------| +| Primary language | [VALUE] | [FILE_PATH] | +| Runtime + version | [VALUE] | [FILE_PATH] | +| Package manager | [VALUE] | [FILE_PATH] | +| Module/build system | [VALUE] | [FILE_PATH] | + +### 2) Production Frameworks and Dependencies + +List only high-impact production dependencies (frameworks, data, transport, auth). + +| Dependency | Version | Role in system | Evidence | +|------------|---------|----------------|----------| +| [NAME] | [VERSION] | [ROLE] | [FILE_PATH] | + +### 3) Development Toolchain + +| Tool | Purpose | Evidence | +|------|---------|----------| +| [TOOL] | [LINT/FORMAT/TEST/BUILD] | [FILE_PATH] | + +### 4) Key Commands + +```bash +[install command] +[build command] +[test command] +[lint command] +``` + +### 5) Environment and Config + +- Config sources: [LIST FILES] +- Required env vars: [VAR_1], [VAR_2], [TODO] +- Deployment/runtime constraints: [SHORT NOTE] + +### 6) Evidence + +- [path/to/manifest] +- [path/to/runtime-config] +- [path/to/build-or-ci-config] + +## Extended Sections (Optional) + +Add only when needed for complex repos: + +- Full dependency taxonomy by category +- Detailed compiler/runtime flags +- Environment matrix (dev/stage/prod) +- Process manager and container runtime details diff --git a/skills/acquire-codebase-knowledge/assets/templates/STRUCTURE.md b/skills/acquire-codebase-knowledge/assets/templates/STRUCTURE.md new file mode 100644 index 000000000..89e9c28f8 --- /dev/null +++ b/skills/acquire-codebase-knowledge/assets/templates/STRUCTURE.md @@ -0,0 +1,44 @@ +# Codebase Structure + +## Core Sections (Required) + +### 1) Top-Level Map + +List only meaningful top-level directories and files. + +| Path | Purpose | Evidence | +|------|---------|----------| +| [path/] | [purpose] | [source] | + +### 2) Entry Points + +- Main runtime entry: [FILE] +- Secondary entry points (worker/cli/jobs): [FILES or NONE] +- How entry is selected (script/config): [NOTE] + +### 3) Module Boundaries + +| Boundary | What belongs here | What must not be here | +|----------|-------------------|------------------------| +| [module/layer] | [responsibility] | [forbidden logic] | + +### 4) Naming and Organization Rules + +- File naming pattern: [kebab/camel/Pascal + examples] +- Directory organization pattern: [feature/layer/domain] +- Import aliasing or path conventions: [RULE] + +### 5) Evidence + +- [path/to/root-tree-source] +- [path/to/entry-config] +- [path/to/key-module] + +## Extended Sections (Optional) + +Add only when repository complexity requires it: + +- Subdirectory deep maps by feature/layer +- Middleware/boot order details +- Generated-vs-source layout boundaries +- Monorepo workspace-level structure maps diff --git a/skills/acquire-codebase-knowledge/assets/templates/TESTING.md b/skills/acquire-codebase-knowledge/assets/templates/TESTING.md new file mode 100644 index 000000000..8e0e7028e --- /dev/null +++ b/skills/acquire-codebase-knowledge/assets/templates/TESTING.md @@ -0,0 +1,57 @@ +# Testing Patterns + +## Core Sections (Required) + +### 1) Test Stack and Commands + +- Primary test framework: [NAME + VERSION] +- Assertion/mocking tools: [TOOLS] +- Commands: + +```bash +[run all tests] +[run unit tests] +[run integration/e2e tests] +[run coverage] +``` + +### 2) Test Layout + +- Test file placement pattern: [co-located/tests folder/etc] +- Naming convention: [pattern] +- Setup files and where they run: [paths] + +### 3) Test Scope Matrix + +| Scope | Covered? | Typical target | Notes | +|-------|----------|----------------|-------| +| Unit | [yes/no] | [modules/services] | [notes] | +| Integration | [yes/no] | [API/data boundaries] | [notes] | +| E2E | [yes/no] | [user flows] | [notes] | + +### 4) Mocking and Isolation Strategy + +- Main mocking approach: [module/class/network] +- Isolation guarantees: [what is reset and when] +- Common failure mode in tests: [short note] + +### 5) Coverage and Quality Signals + +- Coverage tool + threshold: [value or TODO] +- Current reported coverage: [value or TODO] +- Known gaps/flaky areas: [list] + +### 6) Evidence + +- [path/to/test-config] +- [path/to/representative-test-file] +- [path/to/ci-or-coverage-config] + +## Extended Sections (Optional) + +Add only when needed: + +- Framework-specific suite patterns +- Detailed mock recipes per dependency type +- Historical flaky test catalog +- Test performance bottlenecks and optimization ideas From fde9c8a3f1996c19691ac388f49099618d5b5146 Mon Sep 17 00:00:00 2001 From: Satya K Date: Fri, 10 Apr 2026 23:23:23 +0530 Subject: [PATCH 3/8] feat(references): add inquiry checkpoints and stack detection documentation --- .../references/inquiry-checkpoints.md | 70 ++++++++++ .../references/stack-detection.md | 131 ++++++++++++++++++ 2 files changed, 201 insertions(+) create mode 100644 skills/acquire-codebase-knowledge/references/inquiry-checkpoints.md create mode 100644 skills/acquire-codebase-knowledge/references/stack-detection.md diff --git a/skills/acquire-codebase-knowledge/references/inquiry-checkpoints.md b/skills/acquire-codebase-knowledge/references/inquiry-checkpoints.md new file mode 100644 index 000000000..02430e76f --- /dev/null +++ b/skills/acquire-codebase-knowledge/references/inquiry-checkpoints.md @@ -0,0 +1,70 @@ +# Inquiry Checkpoints + +Per-template investigation questions for Phase 2 of the acquire-codebase-knowledge workflow. For each template area, look for answers in the scan output first, then read source files to fill gaps. + +--- + +## 1. STACK.md — Tech Stack + +- What is the primary language and exact version? (check `.nvmrc`, `go.mod`, `pyproject.toml`, Docker `FROM` line) +- What package manager is used? (`npm`, `yarn`, `pnpm`, `go mod`, `pip`, `uv`) +- What are the core runtime frameworks? (web server, ORM, DI container) +- What do `dependencies` (production) vs `devDependencies` (dev tooling) contain? +- Is there a Docker image and what base image does it use? +- What are the key scripts in `package.json` / `Makefile` / `pyproject.toml`? + +## 2. STRUCTURE.md — Directory Layout + +- Where does source code live? (usually `src/`, `lib/`, or project root for Go) +- What are the entry points? (check `main` in `package.json`, `scripts.start`, `cmd/main.go`, `app.py`) +- What is the stated purpose of each top-level directory? +- Are there non-obvious directories (e.g., `eng/`, `platform/`, `infra/`)? +- Are there hidden config directories (`.github/`, `.vscode/`, `.husky/`)? +- What naming conventions do directories follow? (camelCase, kebab-case, domain-based vs layer-based) + +## 3. ARCHITECTURE.md — Patterns + +- Is the code organized by layer (controllers → services → repos) or by feature? +- What is the primary data flow? Trace one request or command from entry to data store. +- Are there singletons, dependency injection patterns, or explicit initialization order requirements? +- Are there background workers, queues, or event-driven components? +- What design patterns appear repeatedly? (Factory, Repository, Decorator, Strategy) + +## 4. CONVENTIONS.md — Coding Standards + +- What is the file naming convention? (check 10+ files — camelCase, kebab-case, PascalCase) +- What is the function and variable naming convention? +- Are private methods/fields prefixed (e.g., `_methodName`, `#field`)? +- What linter and formatter are configured? (check `.eslintrc`, `.prettierrc`, `golangci.yml`) +- What are the TypeScript strictness settings? (`strict`, `noImplicitAny`, etc.) +- How are errors handled at each layer? (throw vs. return structured error) +- What logging library is used and what is the log message format? +- How are imports organized? (barrel exports, path aliases, grouping rules) + +## 5. INTEGRATIONS.md — External Services + +- What external APIs are called? (search for `axios.`, `fetch(`, `http.Get(`, base URLs in constants) +- How are credentials stored and accessed? (`.env`, secrets manager, env vars) +- What databases are connected? (check manifest for `pg`, `mongoose`, `prisma`, `typeorm`, `sqlalchemy`) +- Is there an API gateway, service mesh, or proxy between the app and external services? +- What monitoring or observability tools are used? (APM, Prometheus, logging pipeline) +- Are there message queues or event buses? (Kafka, RabbitMQ, SQS, Pub/Sub) + +## 6. TESTING.md — Test Setup + +- What test runner is configured? (check `scripts.test` in `package.json`, `pytest.ini`, `go test`) +- Where are test files located? (alongside source, in `tests/`, in `__tests__/`) +- What assertion library is used? (Jest expect, Chai, pytest assert) +- How are external dependencies mocked? (jest.mock, dependency injection, fixtures) +- Are there integration tests that hit real services vs. unit tests with mocks? +- Is there a coverage threshold enforced? (check `jest.config.js`, `.nycrc`, `pyproject.toml`) + +## 7. CONCERNS.md — Known Issues + +- How many TODOs/FIXMEs/HACKs are in production code? (see scan output) +- Which files have the highest git churn in the last 90 days? (see scan output) +- Are there any files over 500 lines that mix multiple responsibilities? +- Do any services make sequential calls that could be parallelized? +- Are there hardcoded values (URLs, IDs, magic numbers) that should be config? +- What security risks exist? (missing input validation, raw error messages exposed to clients, missing auth checks) +- Are there performance patterns that don't scale? (N+1 queries, in-memory caches in multi-instance setups) diff --git a/skills/acquire-codebase-knowledge/references/stack-detection.md b/skills/acquire-codebase-knowledge/references/stack-detection.md new file mode 100644 index 000000000..01ccfd7db --- /dev/null +++ b/skills/acquire-codebase-knowledge/references/stack-detection.md @@ -0,0 +1,131 @@ +# Stack Detection Reference + +Load this file when the tech stack is ambiguous — e.g., multiple manifest files present, unfamiliar file extensions, or no obvious `package.json` / `go.mod`. + +--- + +## Manifest File → Ecosystem + +| File | Ecosystem | Key fields to read | +|------|-----------|--------------------| +| `package.json` | Node.js / JavaScript / TypeScript | `dependencies`, `devDependencies`, `scripts`, `main`, `type`, `engines` | +| `go.mod` | Go | Module path, Go version, `require` block | +| `requirements.txt` | Python (pip) | Package list with pinned versions | +| `Pipfile` | Python (pipenv) | `[packages]`, `[dev-packages]`, `[requires]` python version | +| `pyproject.toml` | Python (poetry / uv / hatch) | `[tool.poetry.dependencies]`, `[project]`, `[build-system]` | +| `setup.py` / `setup.cfg` | Python (setuptools, legacy) | `install_requires`, `python_requires` | +| `Cargo.toml` | Rust | `[dependencies]`, `[[bin]]`, `[lib]` | +| `pom.xml` | Java / Kotlin (Maven) | ``, ``, ``, `` | +| `build.gradle` / `build.gradle.kts` | Java / Kotlin (Gradle) | `dependencies {}`, `sourceCompatibility` | +| `composer.json` | PHP | `require`, `require-dev` | +| `Gemfile` | Ruby | `gem` declarations, `ruby` version constraint | +| `mix.exs` | Elixir | `deps/0`, `elixir: "~> X.Y"` | +| `pubspec.yaml` | Dart / Flutter | `dependencies`, `dev_dependencies`, `environment.sdk` | +| `*.csproj` | .NET / C# | ``, `` | +| `*.sln` | .NET solution | References multiple `.csproj` projects | +| `deno.json` / `deno.jsonc` | Deno (TypeScript runtime) | `imports`, `tasks` | +| `bun.lockb` | Bun (JavaScript runtime) | Binary lockfile — check `package.json` for deps | + +--- + +## Language Runtime Version Detection + +| Language | Where to find the version | +|----------|--------------------------| +| Node.js | `.nvmrc`, `.node-version`, `engines.node` in `package.json`, Docker `FROM node:X` | +| Python | `.python-version`, `pyproject.toml [requires-python]`, Docker `FROM python:X` | +| Go | First line of `go.mod` (`go 1.21`) | +| Java | `` in `pom.xml`, `sourceCompatibility` in `build.gradle`, Docker `FROM eclipse-temurin:X` | +| Ruby | `.ruby-version`, `Gemfile` `ruby 'X.Y.Z'` | +| Rust | `rust-toolchain.toml`, `rust-toolchain` file | +| .NET | `` in `.csproj` (e.g., `net8.0`) | + +--- + +## Framework Detection (Node.js / TypeScript) + +| Dependency in `package.json` | Framework | +|-----------------------------|-----------| +| `express` | Express.js (minimal HTTP server) | +| `fastify` | Fastify (high-performance HTTP server) | +| `next` | Next.js (SSR/SSG React — check for `pages/` or `app/` directory) | +| `nuxt` | Nuxt.js (SSR/SSG Vue) | +| `@nestjs/core` | NestJS (opinionated Node.js framework with DI) | +| `koa` | Koa (middleware-focused, no built-in router) | +| `@hapi/hapi` | Hapi | +| `@trpc/server` | tRPC (type-safe API without REST/GraphQL schemas) | +| `routing-controllers` | routing-controllers (decorator-based Express wrapper) | +| `typeorm` | TypeORM (SQL ORM with decorators) | +| `prisma` | Prisma (type-safe ORM, check `prisma/schema.prisma`) | +| `mongoose` | Mongoose (MongoDB ODM) | +| `sequelize` | Sequelize (SQL ORM) | +| `drizzle-orm` | Drizzle (lightweight SQL ORM) | +| `react` without `next` | Vanilla React SPA (check for `react-router-dom`) | +| `vue` without `nuxt` | Vanilla Vue SPA | + +--- + +## Framework Detection (Python) + +| Package | Framework | +|---------|-----------| +| `fastapi` | FastAPI (async REST, auto OpenAPI docs) | +| `flask` | Flask (minimal WSGI web framework) | +| `django` | Django (batteries-included, check `settings.py`) | +| `starlette` | Starlette (ASGI, often used as FastAPI base) | +| `aiohttp` | aiohttp (async HTTP client and server) | +| `sqlalchemy` | SQLAlchemy (SQL ORM; check for `alembic` migrations) | +| `alembic` | Alembic (SQLAlchemy migration tool) | +| `pydantic` | Pydantic (data validation; core to FastAPI) | +| `celery` | Celery (distributed task queue) | + +--- + +## Monorepo Detection + +Check these signals in order: + +1. `pnpm-workspace.yaml` — pnpm workspaces +2. `lerna.json` — Lerna monorepo +3. `nx.json` — Nx monorepo (also check `workspace.json`) +4. `turbo.json` — Turborepo +5. `rush.json` — Rush (Microsoft monorepo manager) +6. `moon.yml` — Moon +7. `package.json` with `"workspaces": [...]` — npm/yarn workspaces +8. Presence of `packages/`, `apps/`, `libs/`, or `services/` directories with their own `package.json` + +If monorepo is detected: each workspace may have **independent** dependencies and conventions. Map each sub-package separately in `STACK.md` and note the monorepo structure in `STRUCTURE.md`. + +--- + +## TypeScript Path Alias Detection + +If `tsconfig.json` has a `paths` key, imports with non-relative prefixes are aliases. Map them before documenting structure. + +```json +// tsconfig.json example +"paths": { + "@/*": ["./src/*"], + "@components/*": ["./src/components/*"], + "@utils/*": ["./src/utils/*"] +} +``` + +Imports like `import { foo } from '@/utils/bar'` resolve to `src/utils/bar`. Document as `src/utils/bar`, not `@/utils/bar`. + +--- + +## Docker Base Image → Runtime + +If no manifest file is present but a `Dockerfile` exists, the `FROM` line reveals the runtime: + +| FROM line pattern | Runtime | +|------------------|---------| +| `FROM node:X` | Node.js X | +| `FROM python:X` | Python X | +| `FROM golang:X` | Go X | +| `FROM eclipse-temurin:X` | Java X (Eclipse Temurin JDK) | +| `FROM mcr.microsoft.com/dotnet/aspnet:X` | .NET X | +| `FROM ruby:X` | Ruby X | +| `FROM rust:X` | Rust X | +| `FROM alpine` (alone) | Check what's installed via `RUN apk add` | From 3d9327e3e6fcf67487f4bcc3bfe4f873a65a80a1 Mon Sep 17 00:00:00 2001 From: Satya K Date: Fri, 10 Apr 2026 23:23:38 +0530 Subject: [PATCH 4/8] feat(scan): add script to collect project discovery information for acquire-codebase-knowledge skill --- .../scripts/scan.sh | 297 ++++++++++++++++++ 1 file changed, 297 insertions(+) create mode 100644 skills/acquire-codebase-knowledge/scripts/scan.sh diff --git a/skills/acquire-codebase-knowledge/scripts/scan.sh b/skills/acquire-codebase-knowledge/scripts/scan.sh new file mode 100644 index 000000000..a24f3456d --- /dev/null +++ b/skills/acquire-codebase-knowledge/scripts/scan.sh @@ -0,0 +1,297 @@ +#!/usr/bin/env bash +# scan.sh — Collect project discovery information for the acquire-codebase-knowledge skill. +# Run from the project root directory. +# +# Usage: bash scripts/scan.sh [OPTIONS] +# +# Options: +# --output FILE Write output to FILE instead of stdout +# --help Show this message and exit +# +# Exit codes: +# 0 Success +# 1 Usage error + +set -euo pipefail + +SCRIPT_NAME="$(basename "$0")" +OUTPUT_FILE="" +TREE_LIMIT=200 +TREE_MAX_DEPTH=3 +TODO_LIMIT=60 +MANIFEST_PREVIEW_LINES=80 +RECENT_COMMITS_LIMIT=20 +CHURN_LIMIT=20 + +# --- Argument parsing --- +while [[ $# -gt 0 ]]; do + case "$1" in + --help) + cat <&2 + echo "Usage: bash $SCRIPT_NAME --output FILE" >&2 + exit 1 + fi + OUTPUT_FILE="$1" + ;; + *) + echo "Error: Unknown option: $1" >&2 + echo "Usage: bash $SCRIPT_NAME [--output FILE] [--help]" >&2 + exit 1 + ;; + esac + shift +done + +# --- Redirect stdout to file if requested --- +if [[ -n "$OUTPUT_FILE" ]]; then + output_dir="$(dirname "$OUTPUT_FILE")" + if [[ "$output_dir" != "." ]]; then + mkdir -p "$output_dir" + fi + exec > "$OUTPUT_FILE" + echo "Writing output to: $OUTPUT_FILE" >&2 +fi + +# --- Directories to exclude from all searches --- +EXCLUDE_DIRS=( + "node_modules" ".git" "dist" "build" "out" ".next" ".nuxt" + "__pycache__" ".venv" "venv" ".tox" "target" "vendor" + "coverage" ".nyc_output" "generated" ".cache" ".turbo" + ".yarn" ".pnp" "bin" "obj" +) + +build_find_command() { + local depth="$1" + local -n out_ref=$2 + + out_ref=(find . -maxdepth "$depth" "(") + for dir in "${EXCLUDE_DIRS[@]}"; do + out_ref+=(-name "$dir" -o) + done + unset 'out_ref[${#out_ref[@]}-1]' + out_ref+=(")" -prune -o -type f -print) +} + +print_limited_file() { + local file_path="$1" + local limit="$2" + local total + total=$(wc -l < "$file_path" | tr -d ' ') + + if [[ "$total" -eq 0 ]]; then + echo "None found." + return + fi + + head -n "$limit" "$file_path" + if [[ "$total" -gt "$limit" ]]; then + echo "[TRUNCATED] Showing first $limit of $total lines." + fi +} + +tmp_files=() +cleanup() { + if [[ ${#tmp_files[@]} -gt 0 ]]; then + rm -f "${tmp_files[@]}" + fi +} +trap cleanup EXIT + +# ============================================================ +echo "=== DIRECTORY TREE (max depth $TREE_MAX_DEPTH, source files only) ===" +tree_tmp="$(mktemp)" +tmp_files+=("$tree_tmp") +find_cmd=() +build_find_command "$TREE_MAX_DEPTH" find_cmd +"${find_cmd[@]}" 2>/dev/null | sed 's|^\./||' | sort > "$tree_tmp" || true +print_limited_file "$tree_tmp" "$TREE_LIMIT" + +echo "" +echo "=== STACK DETECTION (manifest files) ===" +MANIFESTS=( + "package.json" "package-lock.json" "yarn.lock" "pnpm-lock.yaml" + "go.mod" "go.sum" + "requirements.txt" "Pipfile" "Pipfile.lock" "pyproject.toml" "setup.py" "setup.cfg" + "Cargo.toml" "Cargo.lock" + "pom.xml" "build.gradle" "build.gradle.kts" "settings.gradle" "settings.gradle.kts" + "composer.json" "composer.lock" + "Gemfile" "Gemfile.lock" + "mix.exs" "mix.lock" + "pubspec.yaml" + "*.csproj" "*.sln" "global.json" + "deno.json" "deno.jsonc" + "bun.lockb" +) +found_any_manifest=0 +shopt -s nullglob +for pattern in "${MANIFESTS[@]}"; do + for f in $pattern; do + if [[ -f "$f" ]]; then + echo "" + echo "--- $f ---" + head -n "$MANIFEST_PREVIEW_LINES" "$f" + line_count=$(wc -l < "$f" | tr -d ' ') + if [[ "$line_count" -gt "$MANIFEST_PREVIEW_LINES" ]]; then + echo "[TRUNCATED] Showing first $MANIFEST_PREVIEW_LINES of $line_count lines." + fi + found_any_manifest=1 + fi + done +done +shopt -u nullglob +if [[ $found_any_manifest -eq 0 ]]; then + echo "No recognized manifest files found in project root." +fi + +echo "" +echo "=== ENTRY POINTS ===" +ENTRY_CANDIDATES=( + "src/index.ts" "src/index.js" "src/index.mjs" + "src/main.ts" "src/main.js" "src/main.py" + "src/app.ts" "src/app.js" + "src/server.ts" "src/server.js" + "main.go" "cmd/main.go" + "main.py" "app.py" "server.py" "run.py" + "index.ts" "index.js" "app.ts" "app.js" + "lib/index.ts" "lib/index.js" +) +found_any_entry=0 +for f in "${ENTRY_CANDIDATES[@]}"; do + if [[ -f "$f" ]]; then + echo "Found: $f" + found_any_entry=1 + fi +done +if [[ $found_any_entry -eq 0 ]]; then + echo "No common entry points found. Check 'main' or 'scripts.start' in manifest files above." +fi + +echo "" +echo "=== LINTING AND FORMATTING CONFIG ===" +LINT_FILES=( + ".eslintrc" ".eslintrc.json" ".eslintrc.js" ".eslintrc.cjs" ".eslintrc.yml" ".eslintrc.yaml" + "eslint.config.js" "eslint.config.mjs" "eslint.config.cjs" + ".prettierrc" ".prettierrc.json" ".prettierrc.js" ".prettierrc.yml" + "prettier.config.js" "prettier.config.mjs" + ".editorconfig" + "tsconfig.json" "tsconfig.base.json" "tsconfig.build.json" + ".golangci.yml" ".golangci.yaml" + "setup.cfg" ".flake8" ".pylintrc" "mypy.ini" + ".rubocop.yml" "phpcs.xml" "phpstan.neon" + "biome.json" "biome.jsonc" +) +found_any_lint=0 +for f in "${LINT_FILES[@]}"; do + if [[ -f "$f" ]]; then + echo "Found: $f" + found_any_lint=1 + fi +done +if [[ $found_any_lint -eq 0 ]]; then + echo "No linting or formatting config files found in project root." +fi + +echo "" +echo "=== ENVIRONMENT VARIABLE TEMPLATES ===" +ENV_TEMPLATES=(".env.example" ".env.template" ".env.sample" ".env.defaults" ".env.local.example") +found_any_env=0 +for f in "${ENV_TEMPLATES[@]}"; do + if [[ -f "$f" ]]; then + echo "--- $f ---" + cat "$f" + found_any_env=1 + fi +done +if [[ $found_any_env -eq 0 ]]; then + echo "No .env.example or .env.template found. Environment variables must be inferred from code." +fi + +echo "" +echo "=== TODO / FIXME / HACK (production code only, test dirs excluded) ===" +SOURCE_EXTS=( + "*.ts" "*.tsx" "*.js" "*.jsx" "*.mjs" "*.cjs" + "*.py" "*.go" "*.java" "*.kt" "*.rb" "*.php" + "*.rs" "*.cs" "*.cpp" "*.c" "*.h" "*.ex" "*.exs" +) +ext_args=() +for ext in "${SOURCE_EXTS[@]}"; do ext_args+=("--include=$ext"); done +grep_excludes=() +for dir in "${EXCLUDE_DIRS[@]}" "test" "tests" "__tests__" "spec" "__mocks__" "fixtures"; do + grep_excludes+=("--exclude-dir=$dir") +done + +todo_tmp="$(mktemp)" +tmp_files+=("$todo_tmp") +grep -rn "${grep_excludes[@]}" "${ext_args[@]}" \ + -e 'TODO' -e 'FIXME' -e 'HACK' \ + . 2>/dev/null > "$todo_tmp" || true +print_limited_file "$todo_tmp" "$TODO_LIMIT" + +echo "" +echo "=== GIT RECENT COMMITS (last 20) ===" +if git rev-parse --git-dir > /dev/null 2>&1; then + git log --oneline -n "$RECENT_COMMITS_LIMIT" +else + echo "Not a git repository or no commits yet." +fi + +echo "" +echo "=== HIGH-CHURN FILES (last 90 days, top 20) ===" +if git rev-parse --git-dir > /dev/null 2>&1; then + churn_tmp="$(mktemp)" + tmp_files+=("$churn_tmp") + git log --since="90 days ago" --name-only --pretty=format: 2>/dev/null \ + | grep -v "^$" | sort | uniq -c | sort -rn > "$churn_tmp" || true + print_limited_file "$churn_tmp" "$CHURN_LIMIT" +else + echo "Not a git repository." +fi + +echo "" +echo "=== MONOREPO SIGNALS ===" +MONOREPO_FILES=("pnpm-workspace.yaml" "lerna.json" "nx.json" "rush.json" "turbo.json" "moon.yml") +found_monorepo=0 +for f in "${MONOREPO_FILES[@]}"; do + if [[ -f "$f" ]]; then + echo "Monorepo tool detected: $f" + found_monorepo=1 + fi +done +for d in "packages" "apps" "libs" "services" "modules"; do + if [[ -d "$d" ]]; then + echo "Sub-package directory found: $d/" + found_monorepo=1 + fi +done +# Also check package.json workspaces field +if [[ -f "package.json" ]] && grep -q '"workspaces"' package.json 2>/dev/null; then + echo "package.json has 'workspaces' field (npm/yarn workspaces monorepo)" + found_monorepo=1 +fi +if [[ $found_monorepo -eq 0 ]]; then + echo "No monorepo signals detected." +fi + +echo "" +echo "=== SCAN COMPLETE ===" From 77a0a17cf1d6395466601c8d54f4115db534025d Mon Sep 17 00:00:00 2001 From: Satya K Date: Fri, 10 Apr 2026 23:37:26 +0530 Subject: [PATCH 5/8] feat(skills): add acquire-codebase-knowledge skill for codebase mapping and documentation --- docs/README.skills.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/README.skills.md b/docs/README.skills.md index 780afeb8f..3d4a03df9 100644 --- a/docs/README.skills.md +++ b/docs/README.skills.md @@ -26,6 +26,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to | Name | Description | Bundled Assets | | ---- | ----------- | -------------- | +| [acquire-codebase-knowledge](../skills/acquire-codebase-knowledge/SKILL.md) | Use this skill when the user explicitly asks to map, document, or onboard into an existing codebase. Trigger for prompts like "map this codebase", "document this architecture", "onboard me to this repo", or "create codebase docs". Do not trigger for routine feature implementation, bug fixes, or narrow code edits unless the user asks for repository-level discovery. | `assets/templates`
`references/inquiry-checkpoints.md`
`references/stack-detection.md`
`scripts/scan.sh` | | [add-educational-comments](../skills/add-educational-comments/SKILL.md) | Add educational comments to the file specified, or prompt asking for file to comment if one is not provided. | None | | [agent-governance](../skills/agent-governance/SKILL.md) | Patterns and techniques for adding governance, safety, and trust controls to AI agent systems. Use this skill when:
- Building AI agents that call external tools (APIs, databases, file systems)
- Implementing policy-based access controls for agent tool usage
- Adding semantic intent classification to detect dangerous prompts
- Creating trust scoring systems for multi-agent workflows
- Building audit trails for agent actions and decisions
- Enforcing rate limits, content filters, or tool restrictions on agents
- Working with any agent framework (PydanticAI, CrewAI, OpenAI Agents, LangChain, AutoGen) | None | | [agent-owasp-compliance](../skills/agent-owasp-compliance/SKILL.md) | Check any AI agent codebase against the OWASP Agentic Security Initiative (ASI) Top 10 risks.
Use this skill when:
- Evaluating an agent system's security posture before production deployment
- Running a compliance check against OWASP ASI 2026 standards
- Mapping existing security controls to the 10 agentic risks
- Generating a compliance report for security review or audit
- Comparing agent framework security features against the standard
- Any request like "is my agent OWASP compliant?", "check ASI compliance", or "agentic security audit" | None | From 2538230ec2d4a3b04799bd4565fc6212697d1fae Mon Sep 17 00:00:00 2001 From: Satya K Date: Fri, 10 Apr 2026 23:57:44 +0530 Subject: [PATCH 6/8] feat(scan): enhance scan script with absolute path handling and improved output variable validation --- skills/acquire-codebase-knowledge/SKILL.md | 16 +++--- .../scripts/scan.sh | 50 +++++++++++++++---- 2 files changed, 50 insertions(+), 16 deletions(-) diff --git a/skills/acquire-codebase-knowledge/SKILL.md b/skills/acquire-codebase-knowledge/SKILL.md index 53b541540..2a0108832 100644 --- a/skills/acquire-codebase-knowledge/SKILL.md +++ b/skills/acquire-codebase-knowledge/SKILL.md @@ -44,13 +44,17 @@ If the user supplies a focus area (for example: "architecture only" or "testing ### Phase 1: Scan and Read Intent -1. Run from the project root: +1. Set `SKILL_ROOT` to the absolute path of the skill folder (the directory containing this `SKILL.md`), then run the bundled script from the **target project root**: ```bash - bash scripts/scan.sh --output docs/codebase/.codebase-scan.txt - ``` - Windows fallback (if Bash is unavailable): - ```powershell - Get-ChildItem -Recurse -File | Select-Object -First 200 FullName | Out-File docs/codebase/.codebase-scan.txt + export SKILL_ROOT="/absolute/path/to/skills/acquire-codebase-knowledge" + bash "$SKILL_ROOT/scripts/scan.sh" --output docs/codebase/.codebase-scan.txt + ``` + Keep your working directory as the target repository root so the scan covers that codebase, not the skill folder itself. + + **Windows fallback (limited — use only if Bash is unavailable):** The following produces a file listing only and does **not** include manifest previews, git churn, TODO/FIXME counts, or environment variable templates. Downstream phases will have reduced context. + ```powershell + New-Item -ItemType Directory -Force -Path docs/codebase | Out-Null + Get-ChildItem -Recurse -File | Select-Object -First 200 FullName | Out-File docs/codebase/.codebase-scan.txt ``` 2. Search for `PRD`, `TRD`, `README`, `ROADMAP`, `SPEC`, `DESIGN` files and read them. 3. Summarise the stated project intent before reading any source code. diff --git a/skills/acquire-codebase-knowledge/scripts/scan.sh b/skills/acquire-codebase-knowledge/scripts/scan.sh index a24f3456d..15d26eb3f 100644 --- a/skills/acquire-codebase-knowledge/scripts/scan.sh +++ b/skills/acquire-codebase-knowledge/scripts/scan.sh @@ -64,11 +64,15 @@ EOF done # --- Redirect stdout to file if requested --- +OUTPUT_FILE_ABS="" if [[ -n "$OUTPUT_FILE" ]]; then output_dir="$(dirname "$OUTPUT_FILE")" if [[ "$output_dir" != "." ]]; then mkdir -p "$output_dir" fi + # Resolve to absolute path before exec replaces stdout, so downstream + # find/grep calls can explicitly exclude the output file from results. + OUTPUT_FILE_ABS="$(cd "$(dirname "$OUTPUT_FILE")" && pwd)/$(basename "$OUTPUT_FILE")" exec > "$OUTPUT_FILE" echo "Writing output to: $OUTPUT_FILE" >&2 fi @@ -83,14 +87,35 @@ EXCLUDE_DIRS=( build_find_command() { local depth="$1" - local -n out_ref=$2 + local out_var="$2" + local dir quoted assignment + local -a cmd - out_ref=(find . -maxdepth "$depth" "(") + # Validate the output variable name to prevent code injection via eval. + if [[ ! "$out_var" =~ ^[a-zA-Z_][a-zA-Z0-9_]*$ ]]; then + echo "Error: invalid output variable name: $out_var" >&2 + exit 1 + fi + + cmd=(find . -maxdepth "$depth" "(") for dir in "${EXCLUDE_DIRS[@]}"; do - out_ref+=(-name "$dir" -o) + cmd+=(-name "$dir" -o) + done + unset 'cmd[${#cmd[@]}-1]' + cmd+=(" )" -prune -o -type f -print) + + # Exclude the output file from results if one was requested. + if [[ -n "$OUTPUT_FILE_ABS" ]]; then + cmd+=(-not -path "$OUTPUT_FILE_ABS") + fi + + assignment="$out_var=(" + for quoted in "${cmd[@]}"; do + printf -v quoted '%q' "$quoted" + assignment+=" $quoted" done - unset 'out_ref[${#out_ref[@]}-1]' - out_ref+=(")" -prune -o -type f -print) + assignment+=" )" + eval "$assignment" } print_limited_file() { @@ -150,10 +175,15 @@ for pattern in "${MANIFESTS[@]}"; do if [[ -f "$f" ]]; then echo "" echo "--- $f ---" - head -n "$MANIFEST_PREVIEW_LINES" "$f" - line_count=$(wc -l < "$f" | tr -d ' ') - if [[ "$line_count" -gt "$MANIFEST_PREVIEW_LINES" ]]; then - echo "[TRUNCATED] Showing first $MANIFEST_PREVIEW_LINES of $line_count lines." + # bun.lockb is a binary lockfile — printing it produces garbage characters. + if [[ "$f" == "bun.lockb" ]]; then + echo "[Binary lockfile — see package.json for dependency details.]" + else + head -n "$MANIFEST_PREVIEW_LINES" "$f" + line_count=$(wc -l < "$f" | tr -d ' ') + if [[ "$line_count" -gt "$MANIFEST_PREVIEW_LINES" ]]; then + echo "[TRUNCATED] Showing first $MANIFEST_PREVIEW_LINES of $line_count lines." + fi fi found_any_manifest=1 fi @@ -224,7 +254,7 @@ for f in "${ENV_TEMPLATES[@]}"; do fi done if [[ $found_any_env -eq 0 ]]; then - echo "No .env.example or .env.template found. Environment variables must be inferred from code." + echo "No .env.example or .env.template found. Identify required environment variables by searching the code and config for environment variable reads." fi echo "" From 0c5a1eec4b8634a2aceaba5f90444678b3754595 Mon Sep 17 00:00:00 2001 From: Satya K Date: Mon, 13 Apr 2026 13:56:39 +0530 Subject: [PATCH 7/8] feat(scan): replace bash script with Python script for project discovery information collection --- skills/acquire-codebase-knowledge/SKILL.md | 43 +- .../scripts/scan.py | 712 ++++++++++++++++++ .../scripts/scan.sh | 327 -------- 3 files changed, 744 insertions(+), 338 deletions(-) create mode 100644 skills/acquire-codebase-knowledge/scripts/scan.py delete mode 100644 skills/acquire-codebase-knowledge/scripts/scan.sh diff --git a/skills/acquire-codebase-knowledge/SKILL.md b/skills/acquire-codebase-knowledge/SKILL.md index 2a0108832..b449afdb7 100644 --- a/skills/acquire-codebase-knowledge/SKILL.md +++ b/skills/acquire-codebase-knowledge/SKILL.md @@ -2,9 +2,16 @@ name: acquire-codebase-knowledge description: 'Use this skill when the user explicitly asks to map, document, or onboard into an existing codebase. Trigger for prompts like "map this codebase", "document this architecture", "onboard me to this repo", or "create codebase docs". Do not trigger for routine feature implementation, bug fixes, or narrow code edits unless the user asks for repository-level discovery.' license: MIT -compatibility: 'Cross-platform. Preferred: run scripts/scan.sh with Bash. On Windows without Bash, run equivalent PowerShell discovery commands and keep the same output contract. Requires git and standard shell tooling.' +compatibility: 'Cross-platform. Requires Python 3.8+ and git. Run scripts/scan.py from the target project root.' metadata: - version: "1.2" + version: "1.3" + enhancements: + - Multi-language manifest detection (25+ languages supported) + - CI/CD pipeline detection (10+ platforms) + - Container & orchestration detection + - Code metrics by language + - Security & compliance config detection + - Performance testing markers argument-hint: 'Optional: specific area to focus on, e.g. "architecture only", "testing and concerns"' --- @@ -44,18 +51,17 @@ If the user supplies a focus area (for example: "architecture only" or "testing ### Phase 1: Scan and Read Intent -1. Set `SKILL_ROOT` to the absolute path of the skill folder (the directory containing this `SKILL.md`), then run the bundled script from the **target project root**: +1. Run the scan script from the target project root: ```bash - export SKILL_ROOT="/absolute/path/to/skills/acquire-codebase-knowledge" - bash "$SKILL_ROOT/scripts/scan.sh" --output docs/codebase/.codebase-scan.txt + python3 "$SKILL_ROOT/scripts/scan.py" --output docs/codebase/.codebase-scan.txt ``` - Keep your working directory as the target repository root so the scan covers that codebase, not the skill folder itself. + Where `$SKILL_ROOT` is the absolute path to the skill folder. Works on Windows, macOS, and Linux. - **Windows fallback (limited — use only if Bash is unavailable):** The following produces a file listing only and does **not** include manifest previews, git churn, TODO/FIXME counts, or environment variable templates. Downstream phases will have reduced context. - ```powershell - New-Item -ItemType Directory -Force -Path docs/codebase | Out-Null - Get-ChildItem -Recurse -File | Select-Object -First 200 FullName | Out-File docs/codebase/.codebase-scan.txt + **Quick start:** If you have the path inline: + ```bash + python3 /absolute/path/to/skills/acquire-codebase-knowledge/scripts/scan.py --output docs/codebase/.codebase-scan.txt ``` + 2. Search for `PRD`, `TRD`, `README`, `ROADMAP`, `SPEC`, `DESIGN` files and read them. 3. Summarise the stated project intent before reading any source code. @@ -132,11 +138,26 @@ Validation pass criteria: --- +## Enhanced Scan Output Sections + +The `scan.py` script now produce the following sections in addition to the original output: + +- **CODE METRICS** — Total files, lines of code by language, largest files (complexity signals) +- **CI/CD PIPELINES** — Detected GitHub Actions, GitLab CI, Jenkins, CircleCI, etc. +- **CONTAINERS & ORCHESTRATION** — Docker, Docker Compose, Kubernetes, Vagrant configs +- **SECURITY & COMPLIANCE** — Snyk, Dependabot, SECURITY.md, SBOM, security policies +- **PERFORMANCE & TESTING** — Benchmark configs, profiling markers, load testing tools + +Use these sections during Phase 2 to inform investigation questions and identify tool-specific patterns. + +--- + ## Bundled Assets | Asset | When to load | |-------|-------------| -| [`scripts/scan.sh`](scripts/scan.sh) | Phase 1 — run first, before reading any code | +| [`scripts/scan.py`](scripts/scan.py) | Phase 1 — run first, before reading any code (Python 3.8+ required) | + | [`references/inquiry-checkpoints.md`](references/inquiry-checkpoints.md) | Phase 2 — load for per-template investigation questions | | [`references/stack-detection.md`](references/stack-detection.md) | Phase 2 — only if stack is ambiguous | | [`assets/templates/STACK.md`](assets/templates/STACK.md) | Phase 3 step 1 | diff --git a/skills/acquire-codebase-knowledge/scripts/scan.py b/skills/acquire-codebase-knowledge/scripts/scan.py new file mode 100644 index 000000000..15e17a28b --- /dev/null +++ b/skills/acquire-codebase-knowledge/scripts/scan.py @@ -0,0 +1,712 @@ +#!/usr/bin/env python3 +""" +scan.py — Collect project discovery information for the acquire-codebase-knowledge skill. +Run from the project root directory. + +Usage: python3 scan.py [OPTIONS] + +Options: + --output FILE Write output to FILE instead of stdout + --help Show this message and exit + +Exit codes: + 0 Success + 1 Usage error +""" + +import os +import sys +import argparse +import subprocess +import json +from pathlib import Path +from typing import List, Set +import re + +TREE_LIMIT = 200 +TREE_MAX_DEPTH = 3 +TODO_LIMIT = 60 +MANIFEST_PREVIEW_LINES = 80 +RECENT_COMMITS_LIMIT = 20 +CHURN_LIMIT = 20 + +EXCLUDE_DIRS = { + "node_modules", ".git", "dist", "build", "out", ".next", ".nuxt", + "__pycache__", ".venv", "venv", ".tox", "target", "vendor", + "coverage", ".nyc_output", "generated", ".cache", ".turbo", + ".yarn", ".pnp", "bin", "obj" +} + +MANIFESTS = [ + # JavaScript/Node.js + "package.json", "package-lock.json", "yarn.lock", "pnpm-lock.yaml", "bun.lockb", + "deno.json", "deno.jsonc", + # Python + "requirements.txt", "Pipfile", "Pipfile.lock", "pyproject.toml", "setup.py", "setup.cfg", + "poetry.lock", "pdm.lock", "uv.lock", + # Go + "go.mod", "go.sum", + # Rust + "Cargo.toml", "Cargo.lock", + # Java/Kotlin + "pom.xml", "build.gradle", "build.gradle.kts", "settings.gradle", "settings.gradle.kts", + "gradle.properties", + # PHP/Composer + "composer.json", "composer.lock", + # Ruby + "Gemfile", "Gemfile.lock", "*.gemspec", + # Elixir + "mix.exs", "mix.lock", + # Dart/Flutter + "pubspec.yaml", "pubspec.lock", + # .NET/C# + "*.csproj", "*.sln", "*.slnx", "global.json", "packages.config", + # Swift + "Package.swift", "Package.resolved", + # Scala + "build.sbt", "scala-cli.yml", + # Haskell + "*.cabal", "stack.yaml", "cabal.project", "cabal.project.local", + # OCaml + "dune-project", "opam", "opam.lock", + # Nim + "*.nimble", "nim.cfg", + # Crystal + "shard.yml", "shard.lock", + # R + "DESCRIPTION", "renv.lock", + # Julia + "Project.toml", "Manifest.toml", + # Build systems + "CMakeLists.txt", "Makefile", "GNUmakefile", + "SConstruct", "build.xml", + "BUILD", "BUILD.bazel", "WORKSPACE", "bazel.lock", + "justfile", ".justfile", "Taskfile.yml", + "tox.ini", "Vagrantfile" +] + +ENTRY_CANDIDATES = [ + # JavaScript/Node.js/TypeScript + "src/index.ts", "src/index.js", "src/index.mjs", + "src/main.ts", "src/main.js", "src/main.py", + "src/app.ts", "src/app.js", + "src/server.ts", "src/server.js", + "index.ts", "index.js", "app.ts", "app.js", + "lib/index.ts", "lib/index.js", + # Go + "main.go", "cmd/main.go", "cmd/*/main.go", + # Python + "main.py", "app.py", "server.py", "run.py", "cli.py", + "src/main.py", "src/__main__.py", + # .NET/C# + "Program.cs", "src/Program.cs", "Main.cs", + # Java + "Main.java", "Application.java", "App.java", + "src/main/java/Main.java", + # Kotlin + "Main.kt", "Application.kt", "App.kt", + # Rust + "src/main.rs", "src/lib.rs", + # Swift + "main.swift", "Package.swift", "Sources/main.swift", + # Ruby + "app.rb", "main.rb", "lib/app.rb", + # PHP + "index.php", "app.php", "public/index.php", + # Go + "cmd/*/main.go", + # Scala + "src/main/scala/Main.scala", + # Haskell + "Main.hs", "app/Main.hs", + # Clojure + "src/core.clj", "-main.clj", + # Elixir + "lib/application.ex", "mix.exs", +] + +LINT_FILES = [ + ".eslintrc", ".eslintrc.json", ".eslintrc.js", ".eslintrc.cjs", ".eslintrc.yml", ".eslintrc.yaml", + "eslint.config.js", "eslint.config.mjs", "eslint.config.cjs", + ".prettierrc", ".prettierrc.json", ".prettierrc.js", ".prettierrc.yml", + "prettier.config.js", "prettier.config.mjs", + ".editorconfig", + "tsconfig.json", "tsconfig.base.json", "tsconfig.build.json", + ".golangci.yml", ".golangci.yaml", + "setup.cfg", ".flake8", ".pylintrc", "mypy.ini", + ".rubocop.yml", "phpcs.xml", "phpstan.neon", + "biome.json", "biome.jsonc" +] + +ENV_TEMPLATES = [".env.example", ".env.template", ".env.sample", ".env.defaults", ".env.local.example"] + +SOURCE_EXTS = [ + "ts", "tsx", "js", "jsx", "mjs", "cjs", + "py", "go", "java", "kt", "rb", "php", + "rs", "cs", "cpp", "c", "h", "ex", "exs", + "swift", "scala", "clj", "cljs", "lua", + "vim", "vim", "hs", "ml", "ml", "nim", "cr", + "r", "jl", "groovy", "gradle", "xml", "json" +] + +MONOREPO_FILES = ["pnpm-workspace.yaml", "lerna.json", "nx.json", "rush.json", "turbo.json", "moon.yml"] +MONOREPO_DIRS = ["packages", "apps", "libs", "services", "modules"] + +CI_CD_CONFIGS = { + ".github/workflows": "GitHub Actions", + ".gitlab-ci.yml": "GitLab CI", + "Jenkinsfile": "Jenkins", + ".circleci/config.yml": "CircleCI", + ".travis.yml": "Travis CI", + "azure-pipelines.yml": "Azure Pipelines", + "appveyor.yml": "AppVeyor", + ".drone.yml": "Drone CI", + ".woodpecker.yml": "Woodpecker CI", + "bitbucket-pipelines.yml": "Bitbucket Pipelines" +} + +CONTAINER_FILES = [ + "Dockerfile", "docker-compose.yml", "docker-compose.yaml", + ".dockerignore", "Dockerfile.*", + "k8s", "kustomization.yaml", "Chart.yaml", + "Vagrantfile", "podman-compose.yml" +] + +SECURITY_CONFIGS = [ + ".snyk", "security.txt", "SECURITY.md", + ".dependabot.yml", ".whitesource", + "sbom.json", "sbom.spdx", ".bandit.yaml" +] + +PERFORMANCE_MARKERS = [ + "benchmark", "bench", "perf.data", ".prof", + "k6.js", "locustfile.py", "jmeter.jmx" +] + + +def parse_args(): + """Parse command-line arguments.""" + parser = argparse.ArgumentParser( + description="Scan the current directory (project root) and output discovery information " + "for the acquire-codebase-knowledge skill.", + add_help=True + ) + parser.add_argument( + "--output", + type=str, + help="Write output to FILE instead of stdout" + ) + return parser.parse_args() + + +def should_exclude(path: Path) -> bool: + """Check if a path should be excluded from scanning.""" + return any(part in EXCLUDE_DIRS for part in path.parts) + + +def get_directory_tree(max_depth: int = TREE_MAX_DEPTH) -> List[str]: + """Get directory tree up to max_depth.""" + files = [] + + def walk(path: Path, depth: int): + if depth > max_depth or should_exclude(path): + return + try: + for item in sorted(path.iterdir()): + if should_exclude(item): + continue + rel_path = item.relative_to(Path.cwd()) + files.append(str(rel_path)) + if item.is_dir(): + walk(item, depth + 1) + except (PermissionError, OSError): + pass + + walk(Path.cwd(), 0) + return files[:TREE_LIMIT] + + +def find_manifest_files() -> List[str]: + """Find manifest files matching patterns.""" + found = [] + for pattern in MANIFESTS: + if "*" in pattern: + # Handle glob patterns + for path in Path.cwd().glob(pattern): + if path.is_file() and not should_exclude(path): + found.append(path.name) + else: + path = Path.cwd() / pattern + if path.is_file(): + found.append(pattern) + return sorted(set(found)) + + +def read_file_preview(filepath: Path, max_lines: int = MANIFEST_PREVIEW_LINES) -> str: + """Read file with line limit.""" + try: + with open(filepath, 'r', encoding='utf-8', errors='replace') as f: + lines = f.readlines() + + if not lines: + return "None found." + + preview = ''.join(lines[:max_lines]) + if len(lines) > max_lines: + preview += f"\n[TRUNCATED] Showing first {max_lines} of {len(lines)} lines." + return preview + except Exception as e: + return f"[Error reading file: {e}]" + + +def find_entry_points() -> List[str]: + """Find entry point candidates.""" + found = [] + for candidate in ENTRY_CANDIDATES: + if Path(candidate).exists(): + found.append(candidate) + return found + + +def find_lint_config() -> List[str]: + """Find linting and formatting config files.""" + found = [] + for filename in LINT_FILES: + if Path(filename).exists(): + found.append(filename) + return found + + +def find_env_templates() -> List[tuple]: + """Find environment variable templates.""" + found = [] + for filename in ENV_TEMPLATES: + path = Path(filename) + if path.exists(): + found.append((filename, path)) + return found + + +def search_todos() -> List[str]: + """Search for TODO/FIXME/HACK comments.""" + todos = [] + patterns = ["TODO", "FIXME", "HACK"] + exclude_dirs_str = "|".join(EXCLUDE_DIRS | {"test", "tests", "__tests__", "spec", "__mocks__", "fixtures"}) + + try: + for root, dirs, files in os.walk(Path.cwd()): + # Remove excluded directories from dirs to prevent os.walk from descending + dirs[:] = [d for d in dirs if d not in EXCLUDE_DIRS and d not in {"test", "tests", "__tests__", "spec", "__mocks__", "fixtures"}] + + for file in files: + # Check file extension + ext = Path(file).suffix.lstrip('.') + if ext not in SOURCE_EXTS: + continue + + filepath = Path(root) / file + try: + with open(filepath, 'r', encoding='utf-8', errors='replace') as f: + for line_num, line in enumerate(f, 1): + for pattern in patterns: + if pattern in line: + rel_path = filepath.relative_to(Path.cwd()) + todos.append(f"{rel_path}:{line_num}: {line.strip()}") + except Exception: + pass + except Exception: + pass + + return todos[:TODO_LIMIT] + + +def get_git_commits() -> List[str]: + """Get recent git commits.""" + try: + result = subprocess.run( + ["git", "log", "--oneline", "-n", str(RECENT_COMMITS_LIMIT)], + capture_output=True, + text=True, + cwd=Path.cwd() + ) + if result.returncode == 0: + return result.stdout.strip().split('\n') if result.stdout.strip() else [] + return [] + except Exception: + return [] + + +def get_git_churn() -> List[str]: + """Get high-churn files from last 90 days.""" + try: + result = subprocess.run( + ["git", "log", "--since=90 days ago", "--name-only", "--pretty=format:"], + capture_output=True, + text=True, + cwd=Path.cwd() + ) + if result.returncode == 0: + files = [f.strip() for f in result.stdout.split('\n') if f.strip()] + # Count occurrences + from collections import Counter + counts = Counter(files) + churn = sorted(counts.items(), key=lambda x: x[1], reverse=True) + return [f"{count:4d} {filename}" for filename, count in churn[:CHURN_LIMIT]] + return [] + except Exception: + return [] + + +def is_git_repo() -> bool: + """Check if current directory is a git repository.""" + try: + subprocess.run( + ["git", "rev-parse", "--git-dir"], + capture_output=True, + cwd=Path.cwd(), + timeout=2 + ) + return True + except Exception: + return False + + +def detect_monorepo() -> List[str]: + """Detect monorepo signals.""" + signals = [] + + for filename in MONOREPO_FILES: + if Path(filename).exists(): + signals.append(f"Monorepo tool detected: {filename}") + + for dirname in MONOREPO_DIRS: + if Path(dirname).is_dir(): + signals.append(f"Sub-package directory found: {dirname}/") + + # Check package.json workspaces + if Path("package.json").exists(): + try: + with open("package.json", 'r') as f: + content = f.read() + if '"workspaces"' in content: + signals.append("package.json has 'workspaces' field (npm/yarn workspaces monorepo)") + except Exception: + pass + + return signals + + +def detect_ci_cd_pipelines() -> List[str]: + """Detect CI/CD pipeline configurations.""" + pipelines = [] + + for config_path, pipeline_name in CI_CD_CONFIGS.items(): + path = Path(config_path) + if path.is_file(): + pipelines.append(f"CI/CD: {pipeline_name}") + elif path.is_dir(): + # Check for workflow files in directory + try: + if list(path.glob("*.yml")) or list(path.glob("*.yaml")): + pipelines.append(f"CI/CD: {pipeline_name}") + except Exception: + pass + + return pipelines + + +def detect_containers() -> List[str]: + """Detect containerization and orchestration configs.""" + containers = [] + + for config in CONTAINER_FILES: + path = Path(config) + if path.is_file(): + if "Dockerfile" in config: + containers.append("Container: Docker found") + elif "docker-compose" in config: + containers.append("Orchestration: Docker Compose found") + elif config.endswith(".yaml") or config.endswith(".yml"): + containers.append(f"Container/Orchestration: {config}") + elif path.is_dir(): + if config in ["k8s", "kubernetes"]: + containers.append("Orchestration: Kubernetes configs found") + try: + if list(path.glob("*.yml")) or list(path.glob("*.yaml")): + containers.append(f"Container/Orchestration: {config}/ directory found") + except Exception: + pass + + return containers + + +def detect_security_configs() -> List[str]: + """Detect security and compliance configurations.""" + security = [] + + for config in SECURITY_CONFIGS: + if Path(config).exists(): + config_name = config.replace(".yml", "").replace(".yaml", "").lstrip(".") + security.append(f"Security: {config_name}") + + return security + + +def detect_performance_markers() -> List[str]: + """Detect performance testing and profiling markers.""" + performance = [] + + for marker in PERFORMANCE_MARKERS: + if Path(marker).exists(): + performance.append(f"Performance: {marker} found") + else: + # Check for directories + try: + if Path(marker).is_dir(): + performance.append(f"Performance: {marker}/ directory found") + except Exception: + pass + + return performance + + +def collect_code_metrics() -> dict: + """Collect code metrics: file counts by extension, total LOC.""" + metrics = { + "total_files": 0, + "by_extension": {}, + "by_language": {}, + "total_lines": 0, + "largest_files": [] + } + + # Language mapping + lang_map = { + "ts": "TypeScript", "tsx": "TypeScript/React", "js": "JavaScript", + "jsx": "JavaScript/React", "py": "Python", "go": "Go", + "java": "Java", "kt": "Kotlin", "rs": "Rust", + "cs": "C#", "rb": "Ruby", "php": "PHP", + "swift": "Swift", "scala": "Scala", "ex": "Elixir", + "cpp": "C++", "c": "C", "h": "C Header", + "clj": "Clojure", "lua": "Lua", "hs": "Haskell" + } + + file_sizes = [] + + try: + for root, dirs, files in os.walk(Path.cwd()): + dirs[:] = [d for d in dirs if d not in EXCLUDE_DIRS] + + for file in files: + filepath = Path(root) / file + ext = filepath.suffix.lstrip('.') + + if not ext or ext in {"pyc", "o", "a", "so"}: + continue + + try: + size = filepath.stat().st_size + file_sizes.append((filepath.relative_to(Path.cwd()), size)) + + metrics["total_files"] += 1 + metrics["by_extension"][ext] = metrics["by_extension"].get(ext, 0) + 1 + + lang = lang_map.get(ext, "Other") + metrics["by_language"][lang] = metrics["by_language"].get(lang, 0) + 1 + + # Count lines for text files + if ext in SOURCE_EXTS and size < 1_000_000: # Skip huge files + try: + with open(filepath, 'r', encoding='utf-8', errors='ignore') as f: + metrics["total_lines"] += len(f.readlines()) + except Exception: + pass + except Exception: + pass + + # Top 10 largest files + file_sizes.sort(key=lambda x: x[1], reverse=True) + metrics["largest_files"] = [ + f"{str(f)}: {s/1024:.1f}KB" for f, s in file_sizes[:10] + ] + + except Exception: + pass + + return metrics + + +def print_section(title: str, content: List[str], output_file=None) -> None: + """Print a section with title and content.""" + lines = [f"\n=== {title} ==="] + + if isinstance(content, list): + lines.extend(content if content else ["None found."]) + elif isinstance(content, str): + lines.append(content) + + text = '\n'.join(lines) + '\n' + + if output_file: + output_file.write(text) + else: + print(text, end='') + + +def main(): + """Main entry point.""" + args = parse_args() + + output_file = None + if args.output: + output_dir = Path(args.output).parent + output_dir.mkdir(parents=True, exist_ok=True) + output_file = open(args.output, 'w', encoding='utf-8') + print(f"Writing output to: {args.output}", file=sys.stderr) + + try: + # Directory tree + print_section( + f"DIRECTORY TREE (max depth {TREE_MAX_DEPTH}, source files only)", + get_directory_tree(), + output_file + ) + + # Stack detection + manifests = find_manifest_files() + if manifests: + manifest_content = [""] + for manifest in manifests: + manifest_path = Path(manifest) + manifest_content.append(f"--- {manifest} ---") + if manifest == "bun.lockb": + manifest_content.append("[Binary lockfile — see package.json for dependency details.]") + else: + manifest_content.append(read_file_preview(manifest_path)) + print_section("STACK DETECTION (manifest files)", manifest_content, output_file) + else: + print_section("STACK DETECTION (manifest files)", ["No recognized manifest files found in project root."], output_file) + + # Entry points + entries = find_entry_points() + if entries: + entry_content = [f"Found: {e}" for e in entries] + print_section("ENTRY POINTS", entry_content, output_file) + else: + print_section("ENTRY POINTS", ["No common entry points found. Check 'main' or 'scripts.start' in manifest files above."], output_file) + + # Linting config + lint = find_lint_config() + if lint: + lint_content = [f"Found: {l}" for l in lint] + print_section("LINTING AND FORMATTING CONFIG", lint_content, output_file) + else: + print_section("LINTING AND FORMATTING CONFIG", ["No linting or formatting config files found in project root."], output_file) + + # Environment templates + envs = find_env_templates() + if envs: + env_content = [] + for filename, filepath in envs: + env_content.append(f"--- {filename} ---") + env_content.append(read_file_preview(filepath)) + print_section("ENVIRONMENT VARIABLE TEMPLATES", env_content, output_file) + else: + print_section("ENVIRONMENT VARIABLE TEMPLATES", ["No .env.example or .env.template found. Identify required environment variables by searching the code and config for environment variable reads."], output_file) + + # TODOs + todos = search_todos() + if todos: + print_section("TODO / FIXME / HACK (production code only, test dirs excluded)", todos, output_file) + else: + print_section("TODO / FIXME / HACK (production code only, test dirs excluded)", ["None found."], output_file) + + # Git info + if is_git_repo(): + commits = get_git_commits() + if commits: + print_section("GIT RECENT COMMITS (last 20)", commits, output_file) + else: + print_section("GIT RECENT COMMITS (last 20)", ["No commits found."], output_file) + + churn = get_git_churn() + if churn: + print_section("HIGH-CHURN FILES (last 90 days, top 20)", churn, output_file) + else: + print_section("HIGH-CHURN FILES (last 90 days, top 20)", ["None found."], output_file) + else: + print_section("GIT RECENT COMMITS (last 20)", ["Not a git repository or no commits yet."], output_file) + print_section("HIGH-CHURN FILES (last 90 days, top 20)", ["Not a git repository."], output_file) + + # Monorepo detection + monorepo = detect_monorepo() + if monorepo: + print_section("MONOREPO SIGNALS", monorepo, output_file) + else: + print_section("MONOREPO SIGNALS", ["No monorepo signals detected."], output_file) + + # Code metrics + metrics = collect_code_metrics() + metrics_output = [ + f"Total files scanned: {metrics['total_files']}", + f"Total lines of code: {metrics['total_lines']}", + "" + ] + if metrics["by_language"]: + metrics_output.append("Files by language:") + for lang, count in sorted(metrics["by_language"].items(), key=lambda x: x[1], reverse=True): + metrics_output.append(f" {lang}: {count}") + if metrics["largest_files"]: + metrics_output.append("") + metrics_output.append("Top 10 largest files:") + metrics_output.extend(metrics["largest_files"]) + print_section("CODE METRICS", metrics_output, output_file) + + # CI/CD Detection + ci_cd = detect_ci_cd_pipelines() + if ci_cd: + print_section("CI/CD PIPELINES", ci_cd, output_file) + else: + print_section("CI/CD PIPELINES", ["No CI/CD pipelines detected."], output_file) + + # Container Detection + containers = detect_containers() + if containers: + print_section("CONTAINERS & ORCHESTRATION", containers, output_file) + else: + print_section("CONTAINERS & ORCHESTRATION", ["No containerization configs detected."], output_file) + + # Security Configs + security = detect_security_configs() + if security: + print_section("SECURITY & COMPLIANCE", security, output_file) + else: + print_section("SECURITY & COMPLIANCE", ["No security configs detected."], output_file) + + # Performance Markers + performance = detect_performance_markers() + if performance: + print_section("PERFORMANCE & TESTING", performance, output_file) + else: + print_section("PERFORMANCE & TESTING", ["No performance testing configs detected."], output_file) + + # Final message + final_msg = "\n=== SCAN COMPLETE ===\n" + if output_file: + output_file.write(final_msg) + else: + print(final_msg, end='') + + return 0 + + except Exception as e: + print(f"Error: {e}", file=sys.stderr) + return 1 + + finally: + if output_file: + output_file.close() + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/skills/acquire-codebase-knowledge/scripts/scan.sh b/skills/acquire-codebase-knowledge/scripts/scan.sh deleted file mode 100644 index 15d26eb3f..000000000 --- a/skills/acquire-codebase-knowledge/scripts/scan.sh +++ /dev/null @@ -1,327 +0,0 @@ -#!/usr/bin/env bash -# scan.sh — Collect project discovery information for the acquire-codebase-knowledge skill. -# Run from the project root directory. -# -# Usage: bash scripts/scan.sh [OPTIONS] -# -# Options: -# --output FILE Write output to FILE instead of stdout -# --help Show this message and exit -# -# Exit codes: -# 0 Success -# 1 Usage error - -set -euo pipefail - -SCRIPT_NAME="$(basename "$0")" -OUTPUT_FILE="" -TREE_LIMIT=200 -TREE_MAX_DEPTH=3 -TODO_LIMIT=60 -MANIFEST_PREVIEW_LINES=80 -RECENT_COMMITS_LIMIT=20 -CHURN_LIMIT=20 - -# --- Argument parsing --- -while [[ $# -gt 0 ]]; do - case "$1" in - --help) - cat <&2 - echo "Usage: bash $SCRIPT_NAME --output FILE" >&2 - exit 1 - fi - OUTPUT_FILE="$1" - ;; - *) - echo "Error: Unknown option: $1" >&2 - echo "Usage: bash $SCRIPT_NAME [--output FILE] [--help]" >&2 - exit 1 - ;; - esac - shift -done - -# --- Redirect stdout to file if requested --- -OUTPUT_FILE_ABS="" -if [[ -n "$OUTPUT_FILE" ]]; then - output_dir="$(dirname "$OUTPUT_FILE")" - if [[ "$output_dir" != "." ]]; then - mkdir -p "$output_dir" - fi - # Resolve to absolute path before exec replaces stdout, so downstream - # find/grep calls can explicitly exclude the output file from results. - OUTPUT_FILE_ABS="$(cd "$(dirname "$OUTPUT_FILE")" && pwd)/$(basename "$OUTPUT_FILE")" - exec > "$OUTPUT_FILE" - echo "Writing output to: $OUTPUT_FILE" >&2 -fi - -# --- Directories to exclude from all searches --- -EXCLUDE_DIRS=( - "node_modules" ".git" "dist" "build" "out" ".next" ".nuxt" - "__pycache__" ".venv" "venv" ".tox" "target" "vendor" - "coverage" ".nyc_output" "generated" ".cache" ".turbo" - ".yarn" ".pnp" "bin" "obj" -) - -build_find_command() { - local depth="$1" - local out_var="$2" - local dir quoted assignment - local -a cmd - - # Validate the output variable name to prevent code injection via eval. - if [[ ! "$out_var" =~ ^[a-zA-Z_][a-zA-Z0-9_]*$ ]]; then - echo "Error: invalid output variable name: $out_var" >&2 - exit 1 - fi - - cmd=(find . -maxdepth "$depth" "(") - for dir in "${EXCLUDE_DIRS[@]}"; do - cmd+=(-name "$dir" -o) - done - unset 'cmd[${#cmd[@]}-1]' - cmd+=(" )" -prune -o -type f -print) - - # Exclude the output file from results if one was requested. - if [[ -n "$OUTPUT_FILE_ABS" ]]; then - cmd+=(-not -path "$OUTPUT_FILE_ABS") - fi - - assignment="$out_var=(" - for quoted in "${cmd[@]}"; do - printf -v quoted '%q' "$quoted" - assignment+=" $quoted" - done - assignment+=" )" - eval "$assignment" -} - -print_limited_file() { - local file_path="$1" - local limit="$2" - local total - total=$(wc -l < "$file_path" | tr -d ' ') - - if [[ "$total" -eq 0 ]]; then - echo "None found." - return - fi - - head -n "$limit" "$file_path" - if [[ "$total" -gt "$limit" ]]; then - echo "[TRUNCATED] Showing first $limit of $total lines." - fi -} - -tmp_files=() -cleanup() { - if [[ ${#tmp_files[@]} -gt 0 ]]; then - rm -f "${tmp_files[@]}" - fi -} -trap cleanup EXIT - -# ============================================================ -echo "=== DIRECTORY TREE (max depth $TREE_MAX_DEPTH, source files only) ===" -tree_tmp="$(mktemp)" -tmp_files+=("$tree_tmp") -find_cmd=() -build_find_command "$TREE_MAX_DEPTH" find_cmd -"${find_cmd[@]}" 2>/dev/null | sed 's|^\./||' | sort > "$tree_tmp" || true -print_limited_file "$tree_tmp" "$TREE_LIMIT" - -echo "" -echo "=== STACK DETECTION (manifest files) ===" -MANIFESTS=( - "package.json" "package-lock.json" "yarn.lock" "pnpm-lock.yaml" - "go.mod" "go.sum" - "requirements.txt" "Pipfile" "Pipfile.lock" "pyproject.toml" "setup.py" "setup.cfg" - "Cargo.toml" "Cargo.lock" - "pom.xml" "build.gradle" "build.gradle.kts" "settings.gradle" "settings.gradle.kts" - "composer.json" "composer.lock" - "Gemfile" "Gemfile.lock" - "mix.exs" "mix.lock" - "pubspec.yaml" - "*.csproj" "*.sln" "global.json" - "deno.json" "deno.jsonc" - "bun.lockb" -) -found_any_manifest=0 -shopt -s nullglob -for pattern in "${MANIFESTS[@]}"; do - for f in $pattern; do - if [[ -f "$f" ]]; then - echo "" - echo "--- $f ---" - # bun.lockb is a binary lockfile — printing it produces garbage characters. - if [[ "$f" == "bun.lockb" ]]; then - echo "[Binary lockfile — see package.json for dependency details.]" - else - head -n "$MANIFEST_PREVIEW_LINES" "$f" - line_count=$(wc -l < "$f" | tr -d ' ') - if [[ "$line_count" -gt "$MANIFEST_PREVIEW_LINES" ]]; then - echo "[TRUNCATED] Showing first $MANIFEST_PREVIEW_LINES of $line_count lines." - fi - fi - found_any_manifest=1 - fi - done -done -shopt -u nullglob -if [[ $found_any_manifest -eq 0 ]]; then - echo "No recognized manifest files found in project root." -fi - -echo "" -echo "=== ENTRY POINTS ===" -ENTRY_CANDIDATES=( - "src/index.ts" "src/index.js" "src/index.mjs" - "src/main.ts" "src/main.js" "src/main.py" - "src/app.ts" "src/app.js" - "src/server.ts" "src/server.js" - "main.go" "cmd/main.go" - "main.py" "app.py" "server.py" "run.py" - "index.ts" "index.js" "app.ts" "app.js" - "lib/index.ts" "lib/index.js" -) -found_any_entry=0 -for f in "${ENTRY_CANDIDATES[@]}"; do - if [[ -f "$f" ]]; then - echo "Found: $f" - found_any_entry=1 - fi -done -if [[ $found_any_entry -eq 0 ]]; then - echo "No common entry points found. Check 'main' or 'scripts.start' in manifest files above." -fi - -echo "" -echo "=== LINTING AND FORMATTING CONFIG ===" -LINT_FILES=( - ".eslintrc" ".eslintrc.json" ".eslintrc.js" ".eslintrc.cjs" ".eslintrc.yml" ".eslintrc.yaml" - "eslint.config.js" "eslint.config.mjs" "eslint.config.cjs" - ".prettierrc" ".prettierrc.json" ".prettierrc.js" ".prettierrc.yml" - "prettier.config.js" "prettier.config.mjs" - ".editorconfig" - "tsconfig.json" "tsconfig.base.json" "tsconfig.build.json" - ".golangci.yml" ".golangci.yaml" - "setup.cfg" ".flake8" ".pylintrc" "mypy.ini" - ".rubocop.yml" "phpcs.xml" "phpstan.neon" - "biome.json" "biome.jsonc" -) -found_any_lint=0 -for f in "${LINT_FILES[@]}"; do - if [[ -f "$f" ]]; then - echo "Found: $f" - found_any_lint=1 - fi -done -if [[ $found_any_lint -eq 0 ]]; then - echo "No linting or formatting config files found in project root." -fi - -echo "" -echo "=== ENVIRONMENT VARIABLE TEMPLATES ===" -ENV_TEMPLATES=(".env.example" ".env.template" ".env.sample" ".env.defaults" ".env.local.example") -found_any_env=0 -for f in "${ENV_TEMPLATES[@]}"; do - if [[ -f "$f" ]]; then - echo "--- $f ---" - cat "$f" - found_any_env=1 - fi -done -if [[ $found_any_env -eq 0 ]]; then - echo "No .env.example or .env.template found. Identify required environment variables by searching the code and config for environment variable reads." -fi - -echo "" -echo "=== TODO / FIXME / HACK (production code only, test dirs excluded) ===" -SOURCE_EXTS=( - "*.ts" "*.tsx" "*.js" "*.jsx" "*.mjs" "*.cjs" - "*.py" "*.go" "*.java" "*.kt" "*.rb" "*.php" - "*.rs" "*.cs" "*.cpp" "*.c" "*.h" "*.ex" "*.exs" -) -ext_args=() -for ext in "${SOURCE_EXTS[@]}"; do ext_args+=("--include=$ext"); done -grep_excludes=() -for dir in "${EXCLUDE_DIRS[@]}" "test" "tests" "__tests__" "spec" "__mocks__" "fixtures"; do - grep_excludes+=("--exclude-dir=$dir") -done - -todo_tmp="$(mktemp)" -tmp_files+=("$todo_tmp") -grep -rn "${grep_excludes[@]}" "${ext_args[@]}" \ - -e 'TODO' -e 'FIXME' -e 'HACK' \ - . 2>/dev/null > "$todo_tmp" || true -print_limited_file "$todo_tmp" "$TODO_LIMIT" - -echo "" -echo "=== GIT RECENT COMMITS (last 20) ===" -if git rev-parse --git-dir > /dev/null 2>&1; then - git log --oneline -n "$RECENT_COMMITS_LIMIT" -else - echo "Not a git repository or no commits yet." -fi - -echo "" -echo "=== HIGH-CHURN FILES (last 90 days, top 20) ===" -if git rev-parse --git-dir > /dev/null 2>&1; then - churn_tmp="$(mktemp)" - tmp_files+=("$churn_tmp") - git log --since="90 days ago" --name-only --pretty=format: 2>/dev/null \ - | grep -v "^$" | sort | uniq -c | sort -rn > "$churn_tmp" || true - print_limited_file "$churn_tmp" "$CHURN_LIMIT" -else - echo "Not a git repository." -fi - -echo "" -echo "=== MONOREPO SIGNALS ===" -MONOREPO_FILES=("pnpm-workspace.yaml" "lerna.json" "nx.json" "rush.json" "turbo.json" "moon.yml") -found_monorepo=0 -for f in "${MONOREPO_FILES[@]}"; do - if [[ -f "$f" ]]; then - echo "Monorepo tool detected: $f" - found_monorepo=1 - fi -done -for d in "packages" "apps" "libs" "services" "modules"; do - if [[ -d "$d" ]]; then - echo "Sub-package directory found: $d/" - found_monorepo=1 - fi -done -# Also check package.json workspaces field -if [[ -f "package.json" ]] && grep -q '"workspaces"' package.json 2>/dev/null; then - echo "package.json has 'workspaces' field (npm/yarn workspaces monorepo)" - found_monorepo=1 -fi -if [[ $found_monorepo -eq 0 ]]; then - echo "No monorepo signals detected." -fi - -echo "" -echo "=== SCAN COMPLETE ===" From 113ba9fbec8dd0300a34cb1a20c3c1e7f06f56c8 Mon Sep 17 00:00:00 2001 From: Satya K Date: Mon, 13 Apr 2026 14:37:46 +0530 Subject: [PATCH 8/8] feat(skills): update acquire-codebase-knowledge skill to replace scan.sh with scan.py --- docs/README.skills.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/README.skills.md b/docs/README.skills.md index 3d4a03df9..b3500d68a 100644 --- a/docs/README.skills.md +++ b/docs/README.skills.md @@ -26,7 +26,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to | Name | Description | Bundled Assets | | ---- | ----------- | -------------- | -| [acquire-codebase-knowledge](../skills/acquire-codebase-knowledge/SKILL.md) | Use this skill when the user explicitly asks to map, document, or onboard into an existing codebase. Trigger for prompts like "map this codebase", "document this architecture", "onboard me to this repo", or "create codebase docs". Do not trigger for routine feature implementation, bug fixes, or narrow code edits unless the user asks for repository-level discovery. | `assets/templates`
`references/inquiry-checkpoints.md`
`references/stack-detection.md`
`scripts/scan.sh` | +| [acquire-codebase-knowledge](../skills/acquire-codebase-knowledge/SKILL.md) | Use this skill when the user explicitly asks to map, document, or onboard into an existing codebase. Trigger for prompts like "map this codebase", "document this architecture", "onboard me to this repo", or "create codebase docs". Do not trigger for routine feature implementation, bug fixes, or narrow code edits unless the user asks for repository-level discovery. | `assets/templates`
`references/inquiry-checkpoints.md`
`references/stack-detection.md`
`scripts/scan.py` | | [add-educational-comments](../skills/add-educational-comments/SKILL.md) | Add educational comments to the file specified, or prompt asking for file to comment if one is not provided. | None | | [agent-governance](../skills/agent-governance/SKILL.md) | Patterns and techniques for adding governance, safety, and trust controls to AI agent systems. Use this skill when:
- Building AI agents that call external tools (APIs, databases, file systems)
- Implementing policy-based access controls for agent tool usage
- Adding semantic intent classification to detect dangerous prompts
- Creating trust scoring systems for multi-agent workflows
- Building audit trails for agent actions and decisions
- Enforcing rate limits, content filters, or tool restrictions on agents
- Working with any agent framework (PydanticAI, CrewAI, OpenAI Agents, LangChain, AutoGen) | None | | [agent-owasp-compliance](../skills/agent-owasp-compliance/SKILL.md) | Check any AI agent codebase against the OWASP Agentic Security Initiative (ASI) Top 10 risks.
Use this skill when:
- Evaluating an agent system's security posture before production deployment
- Running a compliance check against OWASP ASI 2026 standards
- Mapping existing security controls to the 10 agentic risks
- Generating a compliance report for security review or audit
- Comparing agent framework security features against the standard
- Any request like "is my agent OWASP compliant?", "check ASI compliance", or "agentic security audit" | None |