Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .github/workflows/generate-toolkit-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ permissions:
jobs:
generate:
runs-on: ubuntu-latest
# Opt in to Node 24 for JavaScript actions before GitHub forces the
# switch on 2026-06-02. Harmless today; unblocks the cutover.
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true"

steps:
- name: Checkout repository
Expand Down Expand Up @@ -57,6 +61,9 @@ jobs:
--llm-provider openai \
--llm-model "$OPENAI_MODEL" \
--llm-api-key "$OPENAI_API_KEY" \
--llm-editor-provider anthropic \
--llm-editor-model "$ANTHROPIC_EDITOR_MODEL" \
--llm-editor-api-key "$ANTHROPIC_API_KEY" \
--toolkit-concurrency 8 \
--llm-concurrency 15 \
--exclude-file ./excluded-toolkits.txt \
Expand All @@ -68,6 +75,11 @@ jobs:
ENGINE_API_KEY: ${{ secrets.ENGINE_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENAI_MODEL: ${{ secrets.OPENAI_MODEL || 'gpt-4o-mini' }}
# Stronger model for the secret-coherence editor. Keeps
# stale-secret cleanup precise instead of re-summarizing the whole
# artifact (which gpt-4o-mini tends to do).
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
ANTHROPIC_EDITOR_MODEL: ${{ secrets.ANTHROPIC_EDITOR_MODEL || 'claude-sonnet-4-6' }}

- name: Sync toolkit sidebar navigation
run: pnpm dlx tsx toolkit-docs-generator/scripts/sync-toolkit-sidebar.ts --remove-empty-sections=false --verbose
Expand Down
55 changes: 55 additions & 0 deletions toolkit-docs-generator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ Required secrets:
Optional secrets:

- `OPENAI_MODEL` (defaults in the workflow)
- `ANTHROPIC_API_KEY` enables the secret-coherence editor (see below). Without it the workflow still runs; the scanners emit warnings but no LLM edits are applied.
- `ANTHROPIC_EDITOR_MODEL` (defaults to `claude-sonnet-4-6` in the workflow)

## Rendering pipeline (docs site)

Expand All @@ -66,6 +68,57 @@ The docs site consumes the generated JSON directly:

This step does not change JSON output. It only updates navigation files.

## Secret coherence (stale-reference cleanup + coverage check)

When a toolkit loses a secret upstream (typically because the tool that required it was removed), the rendered docs can keep mentioning it in the summary and in hand-authored documentation chunks. Symmetrically, a toolkit can end up with current secrets the summary never names, or name them without any link to the Arcade config docs.

The generator runs two checks after summary generation, in [`src/merger/secret-coherence.ts`](src/merger/secret-coherence.ts) and [`src/llm/secret-edit-generator.ts`](src/llm/secret-edit-generator.ts):

1. **Stale-reference scan** (deterministic): diffs current vs previous toolkit secret sets and searches the summary, every toolkit-level `documentationChunks` entry, and every per-tool chunk for any removed secret name. Exact substring match — secret names are distinctive ALLCAPS_WITH_UNDER.
2. **Coverage-gap scan** (deterministic): flags any current secret that is not mentioned in the summary and any summary that lacks a link to the Arcade secret config docs.

If an LLM editor is configured (`--llm-editor-provider` / `--llm-editor-model` / `--llm-editor-api-key`), both classes of issue are auto-fixed:

- Stale references are removed with a **minimum-necessary edit** prompt — whole sentences, bullets, or table rows that exist only to describe the removed secret are deleted; sentences that mention the removed secret alongside other content are minimally rewritten; nothing else is touched. This is intentionally different from the summary generator, which rewrites from scratch and tends to oversimplify.
- Missing secrets get appended to the summary's `**Secrets**` section with as much detail as the secret actually needs — a short URL override may be one line; a scoped API key typically needs several sentences describing the provider dashboard page, required scopes or permissions, and account-tier constraints, plus an inline link to the provider's own documentation for how to create it. The prompt explicitly forbids inventing docs URLs.
- Missing Arcade-config links are added at the end of the `**Secrets**` section.
- The editor is instructed to preserve surrounding content verbatim (no re-summarization, no reorder).

When the editor is not configured, the scanners still run and their findings land as non-fatal warnings in the run log. Editor exceptions are caught individually so a single LLM failure does not break the run.

The default editor model is **Claude Sonnet 4.6** — chosen to avoid the oversimplification observed when bulk summaries were regenerated by `gpt-4o-mini`. Override with `--llm-editor-model` or the `LLM_EDITOR_MODEL` / `ANTHROPIC_EDITOR_MODEL` env var.

### OAuth section in summaries

The summary generator is configured to **never list OAuth scopes** in the generated overview. Each per-provider Arcade auth docs page (under `/en/references/auth-providers/<providerId>`) is the source of truth for scopes and configuration; the summary links to it instead of duplicating. This keeps the overview scannable and prevents drift when provider pages update their scope lists.

### CLI flags

- `--llm-editor-provider <openai|anthropic>` — editor provider. Falls back to `LLM_EDITOR_PROVIDER`.
- `--llm-editor-model <model>` — editor model. Falls back to `LLM_EDITOR_MODEL` / `ANTHROPIC_EDITOR_MODEL`.
- `--llm-editor-api-key <key>` — editor API key. Falls back to `LLM_EDITOR_API_KEY`, then `ANTHROPIC_API_KEY` / `OPENAI_API_KEY` per provider.
- `--llm-editor-base-url <url>` — override editor base URL.
- `--llm-editor-temperature <number>` — editor temperature.
- `--llm-editor-max-tokens <number>` — editor max output tokens (default `8192`).
- `--llm-editor-max-retries <number>` — retry attempts on transient errors (default `3`).
- `--skip-secret-coherence` — disable both the scan and the edit step entirely.

### Local example (editor on)

```bash
pnpm dlx tsx src/cli/index.ts generate \
--providers "Github" \
--tool-metadata-url "$ENGINE_API_URL" \
--tool-metadata-key "$ENGINE_API_KEY" \
--llm-provider openai \
--llm-model gpt-4.1-mini \
--llm-api-key "$OPENAI_API_KEY" \
--llm-editor-provider anthropic \
--llm-editor-model claude-sonnet-4-6 \
--llm-editor-api-key "$ANTHROPIC_API_KEY" \
--output data/toolkits
```

## Architecture at a glance

- **CLI**: `toolkit-docs-generator/src/cli/index.ts`
Expand Down Expand Up @@ -182,6 +235,8 @@ deletes it and rebuilds `index.json`.
- `--previous-output` compare against a previous output directory
- `--custom-sections` load curated docs sections
- `--skip-examples`, `--skip-summary` disable LLM steps
- `--skip-secret-coherence` disable the stale-reference scan + coverage fill (see the Secret coherence section)
- `--llm-editor-provider`, `--llm-editor-model`, `--llm-editor-api-key` configure the secret-coherence editor (Sonnet 4.6 by default)
- `--no-verify-output` skip output verification

## Troubleshooting
Expand Down
Loading
Loading