Skip to content

feat(toolkit-docs-generator): secret-coherence scan + minimal LLM edits#932

Draft
jottakka wants to merge 8 commits intomainfrom
feat/toolkit-docs-secret-coherence
Draft

feat(toolkit-docs-generator): secret-coherence scan + minimal LLM edits#932
jottakka wants to merge 8 commits intomainfrom
feat/toolkit-docs-secret-coherence

Conversation

@jottakka
Copy link
Copy Markdown
Contributor

Summary

  • Adds a deterministic scan that detects (a) stale references to secrets that were removed upstream and (b) gaps in the summary's coverage of current secrets — including a missing link to the Arcade secret config docs.
  • Adds a targeted LLM editor (Claude Sonnet 4.6 by default) that repairs the scan's findings with minimum-necessary edits — no re-summarization.
  • Wires both into DataMerger so every auto run self-heals.
  • Loosens the summary-generation prompt so full secret coverage (each current secret + purpose + Arcade config link) is now part of what the generator writes.

Why

Today gpt-4o-mini writes every toolkit summary from scratch and tends to oversimplify on rerun. When a tool gets dropped upstream, the removed secret stays in hand-authored doc chunks (and sometimes the summary) forever. A real example on main: github.json still references GITHUB_CLASSIC_PERSONAL_ACCESS_TOKEN in its setup table even though the secret was removed when notification tools were dropped in #922.

How

1. Scanners — secret-coherence.ts

Zero LLM, zero false-positive risk (secret names are distinctive ALLCAPS_WITH_UNDER tokens, so an exact substring match is sufficient).

  • detectStaleSecretReferences(toolkit, previousToolkit) — diffs the two secret sets, scans summary, toolkit-level documentationChunks, and per-tool chunks.
  • detectSecretCoverageGaps(toolkit) — flags current secrets missing from the summary and a missing Arcade-config-docs link.
  • groupStaleRefsByTarget — collapses multiple removed-secret hits in one artifact into one LLM call.

2. LLM editor — secret-edit-generator.ts

Unlike toolkit-summary-generator.ts (rewrites from scratch), this editor is told to preserve every sentence and bullet and table row that does not reference the removed secret, delete whole rows/sentences whose only topic is the removed secret, and minimally rewrite mixed sentences. No re-summarization, no reordering, no new information.

  • cleanupStaleReferences(...) handles stale refs for both summary and doc chunks.
  • fillCoverageGaps(...) adds missing secret mentions (one short factual line each) and the Arcade config link without touching unrelated content.

3. Wiring — data-merger.ts

DataMerger.enforceSecretCoherence runs after maybeGenerateSummary. The editor is optional — if unconfigured, scanners still run and emit warnings. Editor exceptions are caught and recorded as warnings so a single LLM failure doesn't break the whole run.

4. CLI + workflow

  • New flags: `--llm-editor-provider / --llm-editor-model / --llm-editor-api-key / --llm-editor-base-url / --llm-editor-temperature / --llm-editor-max-tokens / --llm-editor-max-retries / --skip-secret-coherence`
  • Env equivalents: `LLM_EDITOR_PROVIDER / LLM_EDITOR_MODEL / LLM_EDITOR_API_KEY`. API key falls back to `ANTHROPIC_API_KEY` / `OPENAI_API_KEY` per provider.
  • Resolver fails open: a missing API key degrades to scanner-only warnings rather than crashing the run.
  • generate-toolkit-docs.yml passes `--llm-editor-provider anthropic --llm-editor-model $ANTHROPIC_EDITOR_MODEL --llm-editor-api-key $ANTHROPIC_API_KEY`, defaulting to `claude-sonnet-4-6`. CI secret `ANTHROPIC_API_KEY` needs to be provisioned before the editor actually runs.

5. Summary prompt

Drops the hard 60–140 word cap ("compact but complete, use as many words as needed to cover every capability and secret"). Requires each current secret to be named in backticks with one factual line on how to obtain it, and the `Secrets` section must end with the Arcade config docs link.

Testing

  • tests/merger/secret-coherence.test.ts (13 tests) — scanner behavior across summary, toolkit chunks, tool chunks, coverage gaps, and target grouping.
  • tests/llm/secret-edit-generator.test.ts (6 tests) — cleanup and coverage flows, fence-stripping, empty-response guard.
  • Two new integration tests in tests/merger/data-merger.test.ts — documentation-chunk edit is applied, warning is emitted when editor is absent.
  • tests/workflows/generate-toolkit-docs.test.ts asserts the new editor flags are present.
  • All 543 tests pass locally (`pnpm vitest run toolkit-docs-generator/tests/`). Full repo type-check clean.

Test plan

  • Provision `ANTHROPIC_API_KEY` (and optionally `ANTHROPIC_EDITOR_MODEL`) as repo secrets.
  • Manually trigger `generate-toolkit-docs.yml` via `workflow_dispatch`.
  • Verify the resulting `[AUTO]` PR:
    • `github.json` no longer contains `GITHUB_CLASSIC_PERSONAL_ACCESS_TOKEN` in the setup table.
    • Other affected toolkits (if any) show targeted edits — not full rewrites.
    • Summaries mention all current secrets and link to the Arcade config docs.
  • Run without `ANTHROPIC_API_KEY` to confirm the run degrades to scanner-only warnings and completes successfully.

Refs #926

🤖 Generated with Claude Code

When a toolkit loses a secret upstream (typically because the tool that
required it was dropped), the rendered docs can continue to mention
that secret in the summary and in hand-authored documentation chunks.
One concrete example on main: github.json still references
GITHUB_CLASSIC_PERSONAL_ACCESS_TOKEN after the notification tools were
removed in PR #922.

Symmetrically, toolkits can end up with current secrets that the
summary never mentions, or mention secrets without any link to the
Arcade config docs — leaving readers without the information needed
to actually configure them.

This adds a two-stage pipeline that runs after summary generation:

1. Deterministic scanners (src/merger/secret-coherence.ts)
   - detectStaleSecretReferences: diffs current vs previous toolkit
     secret sets and scans summary, toolkit chunks, and per-tool chunks
     by exact substring for each removed secret.
   - detectSecretCoverageGaps: flags current secrets missing from the
     summary and a missing link to the Arcade secret config docs.
   - groupStaleRefsByTarget: collapses multiple removed-secret hits in
     the same artifact into a single edit target so the LLM is called
     at most once per (summary | chunk).

2. Targeted LLM editor (src/llm/secret-edit-generator.ts)
   - Unlike toolkit-summary-generator (which rewrites from scratch and
     tends to oversimplify), this editor is prompted to make the
     smallest possible change: delete sentences/rows that are only
     about the removed secret, minimally rewrite any sentence that
     mentions the removed secret alongside other content, and never
     re-summarize or reorder sections.
   - A separate fillCoverageGaps method adds missing secret mentions
     and, when required, the Arcade config docs link — also without
     rewriting unrelated text.

Both steps are wired into DataMerger.enforceSecretCoherence, called
after maybeGenerateSummary. The editor is optional: if unconfigured,
the scanners still run and emit warnings, but no content is rewritten.
Failures in the editor are caught and surfaced as warnings so a single
LLM error does not break the run.

Wiring changes:

- DataMergerConfig gains an optional secretEditGenerator.
- CLI gains --llm-editor-provider / --llm-editor-model /
  --llm-editor-api-key / --llm-editor-base-url / etc., mirrored by
  LLM_EDITOR_* env vars, with --skip-secret-coherence for the
  scan-and-edit step. Resolver fails open: a missing API key degrades
  to scanner-only warnings instead of crashing the run.
- Workflow generate-toolkit-docs.yml now passes editor flags pointing
  at Anthropic + claude-sonnet-4-6 (overridable via secrets) so the
  editor stays on a stronger model than the gpt-4o-mini used for bulk
  summary and example generation.

Summary prompt updates (src/llm/toolkit-summary-generator.ts):
- Drop the hard 60–140 word cap; ask for "compact but complete".
- Require each current secret be named in backticks with a one-line
  factual description of how to obtain it from the provider.
- Require the Arcade secret config docs link at the end of the
  **Secrets** section.

Tests:
- tests/merger/secret-coherence.test.ts (13 tests) covers scanner
  behavior across summary, toolkit chunks, tool chunks, coverage gaps,
  and target grouping.
- tests/llm/secret-edit-generator.test.ts (6 tests) exercises the
  cleanup/coverage flows and the fence-stripping / empty-response
  guards with a mocked LLM client.
- Two new DataMerger integration tests verify that a removed secret
  surfacing in a doc chunk drives exactly one cleanup call and that
  the editor-disabled path still emits the warning.
- tests/workflows/generate-toolkit-docs.test.ts asserts the new
  editor flags are present in CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Apr 18, 2026 7:54pm

Request Review

Two issues surfaced by `/acr-run`:

1. FENCE_PATTERN (secret-edit-generator.ts) was non-greedy and unanchored,
   so stripOptionalFence stopped at the FIRST inner ``` when the LLM
   wrapped its edit in a markdown fence and the edit itself contained
   a fenced code block. Result: the rest of the edit was silently
   dropped with no error — corrupted doc chunks written to disk.
   Fix: anchor the pattern to ^…$ and use a greedy capture so the
   match extends to the outer closing fence.

2. enforceSecretCoherence (data-merger.ts) computed coverage gaps once,
   before stale cleanup ran. If cleanup modifies the summary and
   incidentally drops a passage that mentioned a current secret, the
   pre-cleanup gap snapshot would miss it. Fix: re-run
   detectSecretCoherenceIssues after applyStaleRefCleanup so the
   coverage fill sees post-cleanup state.

Tests:
- Two new fence tests cover (a) preserving inner code blocks when
  unwrapping the outer fence, and (b) leaving unwrapped responses
  with inner blocks untouched.
- One new DataMerger test proves the coverage editor receives post-
  cleanup summary content (not a stale snapshot).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4096 was tight. Largest single artifact in current data is a ~6K-char
doc chunk (googlenews) ≈ 1.5K output tokens for a minimal-edit rewrite;
a summary with no word cap for a 40+ tool toolkit with several secrets
can land in the 2–3K output-token range. 8K gives comfortable margin
without meaningful cost or latency impact on Sonnet 4.6. Help text
updated to match. Callers can still override via --llm-editor-max-tokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…secret prose cap

Two changes:

1. README: new Secret coherence section covering the scan/edit pipeline,
   the editor CLI flags, the claude-sonnet-4-6 default, fail-open
   behavior when no API key is set, and a local invocation example.
   Required/optional CI secrets updated with ANTHROPIC_API_KEY and
   ANTHROPIC_EDITOR_MODEL. Key CLI options list updated with the new
   flags.

2. Prompts (summary generator + coverage-fill editor) no longer cap
   each secret at one line. Instead they ask for as much detail as the
   secret actually needs — a short URL override may be one line; a
   scoped API key typically needs several sentences naming the
   provider dashboard page, required scopes or permissions, and any
   account tier. Both prompts also request an inline markdown link to
   the provider's own docs page for how to create/retrieve the secret
   when the model knows it, and explicitly forbid inventing URLs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jottakka added a commit that referenced this pull request Apr 18, 2026
…to Arcade auth provider docs

Per follow-up feedback, summaries should not repeat OAuth scopes — they
belong in the per-toolkit auth provider documentation, not the overview
prose. Each of the six restored summaries now:

- States whether OAuth is required and names the provider.
- Links to the matching Arcade auth provider page under
  /en/references/auth-providers/<provider> instead of listing scopes.
- Where applicable, adds explicit explanations for each secret plus a
  link to the Arcade Dashboard secret setup flow.

The deterministic secret-coverage check in #932 is not changed here —
that PR already accepts either the Arcade secret docs link or the
Dashboard URL, both of which remain present in these summaries.

Refs #926

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…scopes

Per follow-up on PRs #928 and #929, the OAuth section of each summary
should name the provider and link to the Arcade per-provider auth docs
rather than enumerate scopes. Scopes already live on the provider
reference page and repeating them in toolkit summaries creates drift
every time a provider page updates.

Changes:
- Add ARCADE_AUTH_PROVIDERS_BASE_URL constant alongside the existing
  Arcade secret URLs in secret-coherence.ts.
- Rewrite the OAuth bullet in toolkit-summary-generator.ts's prompt to
  require a link to {base}/<providerId> and explicitly forbid listing
  scopes.
- Drop scopes from formatAuth's prompt payload so the model has no
  stray scope list to fall back on.
- README: note the no-scopes-in-summary rule and point to the provider
  reference pages as the source of truth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (round 2)

Four findings from /acr-run:

1. HIGH (5/5) — ANTHROPIC_EDITOR_MODEL was documented as a fallback env
   var in the README but never read by resolveSecretEditGenerator. A
   local dev setting only ANTHROPIC_EDITOR_MODEL would get
   `model = undefined`, the (provider && model) guard would fire, and
   the editor would silently stay inactive. Extract resolveEditorModel
   helper that walks `--llm-editor-model` → LLM_EDITOR_MODEL →
   ANTHROPIC_EDITOR_MODEL in documented order, and use it from both
   the resolver and the verbose-log blocks.

2. MEDIUM — --skip-secret-coherence was documented to "disable both
   the scan and the edit step entirely" but DataMerger never received
   the flag; enforceSecretCoherence always ran, so coherence warnings
   still appeared when the user explicitly opted out. Add
   `skipSecretCoherence` to DataMergerConfig, gate enforcement on it,
   and pass it through from all three merger construction sites in
   the CLI.

3. MEDIUM — FENCE_PATTERN matched non-markdown language fences
   (```python, ```bash, ```json). A documentation chunk whose content
   was a code block would have its fences stripped, corrupting the
   edited output. Tightened the pattern to require either an empty,
   markdown, md, or text tag followed by a newline between the opening
   fence and the captured content, so language-tagged code blocks fall
   through stripOptionalFence unchanged.

4. LOW — verbose log showed "model: undefined" when only
   ANTHROPIC_EDITOR_MODEL was set. Fixed by #1.

Tests added:
- fence strip preserves `\`\`\`python` and `\`\`\`bash` code blocks verbatim
- skipSecretCoherence suppresses both edits and warnings

549 tests pass, type-check clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jottakka added a commit that referenced this pull request Apr 18, 2026
…erence

Seeds a phantom secret (GITHUB_CLASSIC_PERSONAL_ACCESS_TOKEN) into
Github.AssignPullRequestUser's secrets and secretsInfo so that a fresh
workflow run against the live Engine API identifies the secret as
"removed" when it compares the committed toolkit against the generator
output.

The existing documentation chunk in github.json still references this
same secret name (real-world residue from when the notification tools
were dropped in #922), so the secret-coherence step from #932 should:

1. Detect `GITHUB_CLASSIC_PERSONAL_ACCESS_TOKEN` as removed on the
   `Github.AssignPullRequestUser` tool.
2. Scan chunks and find the matching text in the `custom_section` chunk.
3. Call the Claude Sonnet 4.6 editor to minimally edit that chunk —
   deleting the table row and note lines that reference the removed
   secret while preserving the rest of the setup documentation.

To test: trigger `Generate toolkit docs` workflow via workflow_dispatch
on this branch. Confirm the resulting AUTO PR's diff on github.json:
- Removes the phantom secret from the tool's .secrets / .secretsInfo.
- Rewrites the Secrets Setup chunk to drop the stale rows.
- Leaves unrelated chunk content intact (no re-summarization).

Do not merge this commit. The phantom secret must not land on main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… runs

Two workflow additions driven by PR #936 feedback:

1. Job-level `FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true"` opts all
   JavaScript actions into Node 24 ahead of the 2026-06-02
   deprecation. actions/checkout@v4, actions/setup-node@v4,
   peter-evans/create-pull-request@v7, and pnpm/action-setup@v4 all
   trigger the "Node.js 20 actions are deprecated" annotation today;
   the opt-in silences it and matches the runtime we'll be forced
   onto anyway.

2. New `workflow_dispatch` input `providers`. When set to a
   comma-separated provider list (e.g. "Github"), the run uses
   `--providers "$providers"` AND drops `--skip-unchanged` so the
   secret-coherence scan actually re-evaluates those toolkits — even
   when the Engine API reports no version change. Scheduled and
   porter_deploy_succeeded runs keep the previous `--all
   --skip-unchanged` behavior. This is what lets the #935 demo PR
   actually exercise the pipeline end-to-end: trigger the workflow
   with `providers=Github` and the phantom secret gets surfaced +
   cleaned.

Tests added: workflow assertions for the new env var and the
providers input fallback structure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jottakka added a commit that referenced this pull request Apr 18, 2026
…erence

Seeds a phantom secret (GITHUB_CLASSIC_PERSONAL_ACCESS_TOKEN) into
Github.AssignPullRequestUser's secrets and secretsInfo so that a fresh
workflow run against the live Engine API identifies the secret as
"removed" when it compares the committed toolkit against the generator
output.

The existing documentation chunk in github.json still references this
same secret name (real-world residue from when the notification tools
were dropped in #922), so the secret-coherence step from #932 should:

1. Detect `GITHUB_CLASSIC_PERSONAL_ACCESS_TOKEN` as removed on the
   `Github.AssignPullRequestUser` tool.
2. Scan chunks and find the matching text in the `custom_section` chunk.
3. Call the Claude Sonnet 4.6 editor to minimally edit that chunk —
   deleting the table row and note lines that reference the removed
   secret while preserving the rest of the setup documentation.

To test: trigger `Generate toolkit docs` workflow via workflow_dispatch
on this branch. Confirm the resulting AUTO PR's diff on github.json:
- Removes the phantom secret from the tool's .secrets / .secretsInfo.
- Rewrites the Secrets Setup chunk to drop the stale rows.
- Leaves unrelated chunk content intact (no re-summarization).

Do not merge this commit. The phantom secret must not land on main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The stale-secret scanner, coverage-gap detector, and summary-generation
failures all push warnings onto `result.warnings`. Per-provider mode
already echoes those to stdout (line 848 of cli/index.ts). The --all
and regenerate-all paths did not — they only appended to the run log
file on disk, which GitHub Actions runs don't expose.

Result: on the #935 demo, the workflow ran, the phantom secret was
removed from the tool's .secrets array, but no cleanup was applied to
the stale doc chunk that still referenced it AND there was no signal
in the CI log explaining why. The warnings that would have explained
"stale secret detected but edit failed" or "stale secret detected but
no editor configured" were present in memory but discarded.

This commit prints every non-empty `mergeResult.warnings` to stdout
right after `mergeAllToolkits()` returns, in both the `generate --all`
and `regenerate --all` paths. Format matches existing spinner output:

    ⚠ Github: 2 warning(s)
      - Stale secret reference in toolkit_chunk #4: GITHUB_CLASSIC_...
      - Secret cleanup edit failed for Github (documentation_chunk): ...

551 tests pass, type-check clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jottakka added a commit that referenced this pull request Apr 18, 2026
…erence

Seeds a phantom secret (GITHUB_CLASSIC_PERSONAL_ACCESS_TOKEN) into
Github.AssignPullRequestUser's secrets and secretsInfo so that a fresh
workflow run against the live Engine API identifies the secret as
"removed" when it compares the committed toolkit against the generator
output.

The existing documentation chunk in github.json still references this
same secret name (real-world residue from when the notification tools
were dropped in #922), so the secret-coherence step from #932 should:

1. Detect `GITHUB_CLASSIC_PERSONAL_ACCESS_TOKEN` as removed on the
   `Github.AssignPullRequestUser` tool.
2. Scan chunks and find the matching text in the `custom_section` chunk.
3. Call the Claude Sonnet 4.6 editor to minimally edit that chunk —
   deleting the table row and note lines that reference the removed
   secret while preserving the rest of the setup documentation.

To test: trigger `Generate toolkit docs` workflow via workflow_dispatch
on this branch. Confirm the resulting AUTO PR's diff on github.json:
- Removes the phantom secret from the tool's .secrets / .secretsInfo.
- Rewrites the Secrets Setup chunk to drop the stale rows.
- Leaves unrelated chunk content intact (no re-summarization).

Do not merge this commit. The phantom secret must not land on main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant