Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/README.skills.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to

| Name | Description | Bundled Assets |
| ---- | ----------- | -------------- |
| [acquire-codebase-knowledge](../skills/acquire-codebase-knowledge/SKILL.md) | Use this skill when the user explicitly asks to map, document, or onboard into an existing codebase. Trigger for prompts like "map this codebase", "document this architecture", "onboard me to this repo", or "create codebase docs". Do not trigger for routine feature implementation, bug fixes, or narrow code edits unless the user asks for repository-level discovery. | `assets/templates`<br />`references/inquiry-checkpoints.md`<br />`references/stack-detection.md`<br />`scripts/scan.py` |
| [add-educational-comments](../skills/add-educational-comments/SKILL.md) | Add educational comments to the file specified, or prompt asking for file to comment if one is not provided. | None |
| [agent-governance](../skills/agent-governance/SKILL.md) | Patterns and techniques for adding governance, safety, and trust controls to AI agent systems. Use this skill when:<br />- Building AI agents that call external tools (APIs, databases, file systems)<br />- Implementing policy-based access controls for agent tool usage<br />- Adding semantic intent classification to detect dangerous prompts<br />- Creating trust scoring systems for multi-agent workflows<br />- Building audit trails for agent actions and decisions<br />- Enforcing rate limits, content filters, or tool restrictions on agents<br />- Working with any agent framework (PydanticAI, CrewAI, OpenAI Agents, LangChain, AutoGen) | None |
| [agent-owasp-compliance](../skills/agent-owasp-compliance/SKILL.md) | Check any AI agent codebase against the OWASP Agentic Security Initiative (ASI) Top 10 risks.<br />Use this skill when:<br />- Evaluating an agent system's security posture before production deployment<br />- Running a compliance check against OWASP ASI 2026 standards<br />- Mapping existing security controls to the 10 agentic risks<br />- Generating a compliance report for security review or audit<br />- Comparing agent framework security features against the standard<br />- Any request like "is my agent OWASP compliant?", "check ASI compliance", or "agentic security audit" | None |
Expand Down
174 changes: 174 additions & 0 deletions skills/acquire-codebase-knowledge/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
---
name: acquire-codebase-knowledge
description: 'Use this skill when the user explicitly asks to map, document, or onboard into an existing codebase. Trigger for prompts like "map this codebase", "document this architecture", "onboard me to this repo", or "create codebase docs". Do not trigger for routine feature implementation, bug fixes, or narrow code edits unless the user asks for repository-level discovery.'
license: MIT
compatibility: 'Cross-platform. Requires Python 3.8+ and git. Run scripts/scan.py from the target project root.'
metadata:
version: "1.3"
enhancements:
- Multi-language manifest detection (25+ languages supported)
- CI/CD pipeline detection (10+ platforms)
- Container & orchestration detection
- Code metrics by language
- Security & compliance config detection
- Performance testing markers
argument-hint: 'Optional: specific area to focus on, e.g. "architecture only", "testing and concerns"'
---

# Acquire Codebase Knowledge

Produces seven populated documents in `docs/codebase/` covering everything needed to work effectively on the project. Only document what is verifiable from files or terminal output — never infer or assume.

## Output Contract (Required)

Before finishing, all of the following must be true:

1. Exactly these files exist in `docs/codebase/`: `STACK.md`, `STRUCTURE.md`, `ARCHITECTURE.md`, `CONVENTIONS.md`, `INTEGRATIONS.md`, `TESTING.md`, `CONCERNS.md`.
2. Every claim is traceable to source files, config, or terminal output.
3. Unknowns are marked as `[TODO]`; intent-dependent decisions are marked `[ASK USER]`.
4. Every document includes a short "evidence" list with concrete file paths.
5. Final response includes numbered `[ASK USER]` questions and intent-vs-reality divergences.

## Workflow

Copy and track this checklist:

```
- [ ] Phase 1: Run scan, read intent documents
- [ ] Phase 2: Investigate each documentation area
- [ ] Phase 3: Populate all seven docs in docs/codebase/
- [ ] Phase 4: Validate docs, present findings, resolve all [ASK USER] items
```

## Focus Area Mode

If the user supplies a focus area (for example: "architecture only" or "testing and concerns"):

1. Always run Phase 1 in full.
2. Fully complete focus-area documents first.
3. For non-focus documents not yet analyzed, keep required sections present and mark unknowns as `[TODO]`.
4. Still run the Phase 4 validation loop on all seven documents before final output.

### Phase 1: Scan and Read Intent

1. Run the scan script from the target project root:
```bash
python3 "$SKILL_ROOT/scripts/scan.py" --output docs/codebase/.codebase-scan.txt
```
Where `$SKILL_ROOT` is the absolute path to the skill folder. Works on Windows, macOS, and Linux.

**Quick start:** If you have the path inline:
```bash
python3 /absolute/path/to/skills/acquire-codebase-knowledge/scripts/scan.py --output docs/codebase/.codebase-scan.txt
```

2. Search for `PRD`, `TRD`, `README`, `ROADMAP`, `SPEC`, `DESIGN` files and read them.
3. Summarise the stated project intent before reading any source code.

### Phase 2: Investigate

Use the scan output to answer questions for each of the seven templates. Load [`references/inquiry-checkpoints.md`](references/inquiry-checkpoints.md) for the full per-template question list.

If the stack is ambiguous (multiple manifest files, unfamiliar file types, no `package.json`), load [`references/stack-detection.md`](references/stack-detection.md).

### Phase 3: Populate Templates

Copy each template from `assets/templates/` into `docs/codebase/`. Fill in this order:

1. [STACK.md](assets/templates/STACK.md) — language, runtime, frameworks, all dependencies
2. [STRUCTURE.md](assets/templates/STRUCTURE.md) — directory layout, entry points, key files
3. [ARCHITECTURE.md](assets/templates/ARCHITECTURE.md) — layers, patterns, data flow
4. [CONVENTIONS.md](assets/templates/CONVENTIONS.md) — naming, formatting, error handling, imports
5. [INTEGRATIONS.md](assets/templates/INTEGRATIONS.md) — external APIs, databases, auth, monitoring
6. [TESTING.md](assets/templates/TESTING.md) — frameworks, file organization, mocking strategy
7. [CONCERNS.md](assets/templates/CONCERNS.md) — tech debt, bugs, security risks, perf bottlenecks

Use `[TODO]` for anything that cannot be determined from code. Use `[ASK USER]` where the right answer requires team intent.

### Phase 4: Validate, Repair, Verify

Run this mandatory validation loop before finalizing:

1. Validate each doc against `references/inquiry-checkpoints.md`.
2. For each non-trivial claim, confirm at least one evidence reference exists.
3. If any required section is missing or unsupported:
- Fix the document.
- Re-run validation.
4. Repeat until all seven docs pass.

Then present a summary of all seven documents, list every `[ASK USER]` item as a numbered question, and highlight any Intent vs. Reality divergences from Phase 1.

Validation pass criteria:

- No unsupported claims.
- No empty required sections.
- Unknowns use `[TODO]` rather than assumptions.
- Team-intent gaps are explicitly marked `[ASK USER]`.

---

## Gotchas

**Monorepos:** Root `package.json` may have no source — check for `workspaces`, `packages/`, or `apps/` directories. Each workspace may have independent dependencies and conventions. Map each sub-package separately.

**Outdated README:** README often describes intended architecture, not the current one. Cross-reference with actual file structure before treating any README claim as fact.

**TypeScript path aliases:** `tsconfig.json` `paths` config means imports like `@/foo` don't map directly to the filesystem. Map aliases to real paths before documenting structure.

**Generated/compiled output:** Never document patterns from `dist/`, `build/`, `generated/`, `.next/`, `out/`, or `__pycache__/`. These are artefacts — document source conventions only.

**`.env.example` reveals required config:** Secrets are never committed. Read `.env.example`, `.env.template`, or `.env.sample` to discover required environment variables.

**`devDependencies` ≠ production stack:** Only `dependencies` (or equivalent, e.g. `[tool.poetry.dependencies]`) runs in production. Document linters, formatters, and test frameworks separately as dev tooling.

**Test TODOs ≠ production debt:** TODOs inside `test/`, `tests/`, `__tests__/`, or `spec/` are coverage gaps, not production technical debt. Separate them in `CONCERNS.md`.

**High-churn files = fragile areas:** Files appearing most in recent git history have the highest modification rate and likely hidden complexity. Always note them in `CONCERNS.md`.

---

## Anti-Patterns

| ❌ Don't | ✅ Do instead |
|---------|--------------|
| "Uses Clean Architecture with Domain/Data layers." (when no such directories exist) | State only what directory structure actually shows. |
| "This is a Next.js project." (without checking `package.json`) | Check `dependencies` first. State what's actually there. |
| Guess the database from a variable name like `dbUrl` | Check manifest for `pg`, `mysql2`, `mongoose`, `prisma`, etc. |
| Document `dist/` or `build/` naming patterns as conventions | Source files only. |

---

## Enhanced Scan Output Sections

The `scan.py` script now produce the following sections in addition to the original output:

- **CODE METRICS** — Total files, lines of code by language, largest files (complexity signals)
- **CI/CD PIPELINES** — Detected GitHub Actions, GitLab CI, Jenkins, CircleCI, etc.
- **CONTAINERS & ORCHESTRATION** — Docker, Docker Compose, Kubernetes, Vagrant configs
- **SECURITY & COMPLIANCE** — Snyk, Dependabot, SECURITY.md, SBOM, security policies
- **PERFORMANCE & TESTING** — Benchmark configs, profiling markers, load testing tools

Use these sections during Phase 2 to inform investigation questions and identify tool-specific patterns.

---

## Bundled Assets

| Asset | When to load |
|-------|-------------|
| [`scripts/scan.py`](scripts/scan.py) | Phase 1 — run first, before reading any code (Python 3.8+ required) |

| [`references/inquiry-checkpoints.md`](references/inquiry-checkpoints.md) | Phase 2 — load for per-template investigation questions |
| [`references/stack-detection.md`](references/stack-detection.md) | Phase 2 — only if stack is ambiguous |
| [`assets/templates/STACK.md`](assets/templates/STACK.md) | Phase 3 step 1 |
| [`assets/templates/STRUCTURE.md`](assets/templates/STRUCTURE.md) | Phase 3 step 2 |
| [`assets/templates/ARCHITECTURE.md`](assets/templates/ARCHITECTURE.md) | Phase 3 step 3 |
| [`assets/templates/CONVENTIONS.md`](assets/templates/CONVENTIONS.md) | Phase 3 step 4 |
| [`assets/templates/INTEGRATIONS.md`](assets/templates/INTEGRATIONS.md) | Phase 3 step 5 |
| [`assets/templates/TESTING.md`](assets/templates/TESTING.md) | Phase 3 step 6 |
| [`assets/templates/CONCERNS.md`](assets/templates/CONCERNS.md) | Phase 3 step 7 |

Template usage mode:

- Default mode: complete only the "Core Sections (Required)" in each template.
- Extended mode: add optional sections only when the repo complexity justifies them.
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Architecture

## Core Sections (Required)

### 1) Architectural Style

- Primary style: [layered/feature/event-driven/other]
- Why this classification: [short evidence-backed rationale]
- Primary constraints: [2-3 constraints that shape design]

### 2) System Flow

```text
[entry] -> [processing] -> [domain logic] -> [data/integration] -> [response/output]
```

Describe the flow in 4-6 steps using file-backed evidence.

### 3) Layer/Module Responsibilities

| Layer or module | Owns | Must not own | Evidence |
|-----------------|------|--------------|----------|
| [name] | [responsibility] | [non-responsibility] | [file] |

### 4) Reused Patterns

| Pattern | Where found | Why it exists |
|---------|-------------|---------------|
| [singleton/repository/adapter/etc] | [path] | [reason] |

### 5) Known Architectural Risks

- [Risk 1 + impact]
- [Risk 2 + impact]

### 6) Evidence

- [path/to/entrypoint]
- [path/to/main-layer-files]
- [path/to/data-or-integration-layer]

## Extended Sections (Optional)

Add only when needed:

- Startup or initialization order details
- Async/event topology diagrams
- Anti-pattern catalog with refactoring paths
- Failure-mode analysis and resilience posture
56 changes: 56 additions & 0 deletions skills/acquire-codebase-knowledge/assets/templates/CONCERNS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Codebase Concerns

## Core Sections (Required)

### 1) Top Risks (Prioritized)

| Severity | Concern | Evidence | Impact | Suggested action |
|----------|---------|----------|--------|------------------|
| [high/med/low] | [issue] | [file or scan output] | [impact] | [next action] |

### 2) Technical Debt

List the most important debt items only.

| Debt item | Why it exists | Where | Risk if ignored | Suggested fix |
|-----------|---------------|-------|-----------------|---------------|
| [item] | [reason] | [path] | [risk] | [fix] |

### 3) Security Concerns

| Risk | OWASP category (if applicable) | Evidence | Current mitigation | Gap |
|------|--------------------------------|----------|--------------------|-----|
| [risk] | [A01/A03/etc or N/A] | [path] | [what exists] | [what is missing] |

### 4) Performance and Scaling Concerns

| Concern | Evidence | Current symptom | Scaling risk | Suggested improvement |
|---------|----------|-----------------|-------------|-----------------------|
| [issue] | [path/metric] | [symptom] | [risk] | [action] |

### 5) Fragile/High-Churn Areas

| Area | Why fragile | Churn signal | Safe change strategy |
|------|-------------|-------------|----------------------|
| [path] | [reason] | [recent churn evidence] | [approach] |

### 6) `[ASK USER]` Questions

Add unresolved intent-dependent questions as a numbered list.

1. [ASK USER] [question]

### 7) Evidence

- [scan output section reference]
- [path/to/code-file]
- [path/to/config-or-history-evidence]

## Extended Sections (Optional)

Add only when needed:

- Full bug inventory
- Component-level remediation roadmap
- Cost/effort estimates by concern
- Dependency-risk and ownership mapping
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Coding Conventions

## Core Sections (Required)

### 1) Naming Rules

| Item | Rule | Example | Evidence |
|------|------|---------|----------|
| Files | [RULE] | [EXAMPLE] | [FILE] |
| Functions/methods | [RULE] | [EXAMPLE] | [FILE] |
| Types/interfaces | [RULE] | [EXAMPLE] | [FILE] |
| Constants/env vars | [RULE] | [EXAMPLE] | [FILE] |

### 2) Formatting and Linting

- Formatter: [TOOL + CONFIG FILE]
- Linter: [TOOL + CONFIG FILE]
- Most relevant enforced rules: [RULE_1], [RULE_2], [RULE_3]
- Run commands: [COMMANDS]

### 3) Import and Module Conventions

- Import grouping/order: [RULE]
- Alias vs relative import policy: [RULE]
- Public exports/barrel policy: [RULE]

### 4) Error and Logging Conventions

- Error strategy by layer: [SHORT SUMMARY]
- Logging style and required context fields: [SUMMARY]
- Sensitive-data redaction rules: [SUMMARY]

### 5) Testing Conventions

- Test file naming/location rule: [RULE]
- Mocking strategy norm: [RULE]
- Coverage expectation: [RULE or TODO]

### 6) Evidence

- [path/to/lint-config]
- [path/to/format-config]
- [path/to/representative-source-file]

## Extended Sections (Optional)

Add only for large or inconsistent codebases:

- Layer-specific error handling matrix
- Language-specific strictness options
- Repo-specific commit/branching conventions
- Known convention violations to clean up
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# External Integrations

## Core Sections (Required)

### 1) Integration Inventory

| System | Type (API/DB/Queue/etc) | Purpose | Auth model | Criticality | Evidence |
|--------|---------------------------|---------|------------|-------------|----------|
| [name] | [type] | [purpose] | [auth] | [high/med/low] | [file] |

### 2) Data Stores

| Store | Role | Access layer | Key risk | Evidence |
|-------|------|--------------|----------|----------|
| [db/cache/etc] | [role] | [module] | [risk] | [file] |

### 3) Secrets and Credentials Handling

- Credential sources: [env/secrets manager/config]
- Hardcoding checks: [result]
- Rotation or lifecycle notes: [known/unknown]

### 4) Reliability and Failure Behavior

- Retry/backoff behavior: [implemented/none/partial]
- Timeout policy: [where configured]
- Circuit-breaker or fallback behavior: [if any]

### 5) Observability for Integrations

- Logging around external calls: [yes/no + where]
- Metrics/tracing coverage: [yes/no + where]
- Missing visibility gaps: [list]

### 6) Evidence

- [path/to/integration-wrapper]
- [path/to/config-or-env-template]
- [path/to/monitoring-or-logging-config]

## Extended Sections (Optional)

Add only when needed:

- Endpoint-by-endpoint catalog
- Auth flow sequence diagrams
- SLA/SLO per integration
- Region/failover topology notes
Loading
Loading