Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
12144be
Refactor documentation links to use colons for descriptions, update c…
probablyangg Jun 16, 2026
595aefb
fix: replace em dashes with colons in connect.mdx next-steps list
probablyangg Jun 16, 2026
1961920
ci: post styleguide check results as a sticky PR comment
probablyangg Jun 16, 2026
8147fab
ci: link styleguide findings to files at the PR head commit
probablyangg Jun 16, 2026
4a56293
ci: post applyable inline suggestions for em-dash fixes
probablyangg Jun 16, 2026
7d6af01
docs: document the styleguide CI checks and PR suggestion behavior
probablyangg Jun 16, 2026
2ae4bff
fix(i18n): skip in-sync pages when translating explicit paths
probablyangg Jun 16, 2026
df4ece2
trigger ci
probablyangg Jun 16, 2026
06fd22b
i18n: translate via a swappable LLM seam (OpenRouter + cheap model)
probablyangg Jun 16, 2026
c32d386
i18n: default to google/gemini-2.5-flash (valid OpenRouter id) + 32k …
probablyangg Jun 16, 2026
b2acbc9
i18n: auto-translate cn/ko for changed en content
github-actions[bot] Jun 16, 2026
f855c54
i18n: normalize translated link hrefs to the locale slug; repair cn/ko
probablyangg Jun 16, 2026
01d21cd
chore: drop build-generated woff2/styles.css from the i18n link-fix c…
probablyangg Jun 16, 2026
1ca9c5f
Merge remote-tracking branch 'origin/main' into ag/add-styleguide-ci
probablyangg Jun 16, 2026
753fada
i18n: auto-translate cn/ko for changed en content
github-actions[bot] Jun 16, 2026
0df9b28
style: single source of truth for mechanical rules; de-duplicate STYL…
probablyangg Jun 16, 2026
3dc103f
docs(adr): add DR004 (translation LLM provider) and DR005 (styleguide…
probablyangg Jun 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions .github/workflows/docs-style.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
name: docs style

# Enforces the mechanical rules in STYLEGUIDE.md on the en source pages
# (frontmatter, diataxis/folder match, description length, code-fence language
# tags, no em dashes). en is the source of truth; generated cn/ko are not checked.

on:
pull_request:
paths:
- 'docs/pages/en/**'
- 'docs/lib/verify-style.mjs'
workflow_dispatch:

permissions:
contents: read
pull-requests: write

jobs:
style:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version-file: '.nvmrc'
cache: npm
- run: npm ci
- name: Check styleguide compliance
id: style
# Capture the exit code so we can still post the comment on failure;
# the job is failed explicitly in the final step.
run: |
set +e
npm run style:check
echo "exit_code=$?" >> "$GITHUB_OUTPUT"
env:
STYLE_REPORT: ${{ runner.temp }}/style-report.md
STYLE_SUGGESTIONS: ${{ runner.temp }}/style-suggestions.json
# Blob base for the PR head commit; makes report paths clickable
# links pinned to the exact reviewed revision.
STYLE_REPO_URL: ${{ github.server_url }}/${{ github.repository }}/blob/${{ github.event.pull_request.head.sha }}
- name: Comment styleguide report
if: always() && github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const path = `${process.env.RUNNER_TEMP}/style-report.md`;
if (!fs.existsSync(path)) return;
const marker = '<!-- styleguide-check -->';
const body = `${marker}\n${fs.readFileSync(path, 'utf8')}`;
const { owner, repo } = context.repo;
const issue_number = context.issue.number;
const { data: comments } = await github.rest.issues.listComments({ owner, repo, issue_number });
const existing = comments.find(c => c.body && c.body.includes(marker));
if (existing) {
await github.rest.issues.updateComment({ owner, repo, comment_id: existing.id, body });
} else {
await github.rest.issues.createComment({ owner, repo, issue_number, body });
}
- name: Suggest fixes inline
if: always() && github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const path = `${process.env.RUNNER_TEMP}/style-suggestions.json`;
if (!fs.existsSync(path)) return;
const suggestions = JSON.parse(fs.readFileSync(path, 'utf8'));
if (!suggestions.length) return;

const { owner, repo } = context.repo;
const pull_number = context.issue.number;
const commit_id = context.payload.pull_request.head.sha;
const MARK = '<!-- styleguide-suggestion -->';

// Map each changed file to the set of new-side line numbers the PR
// adds. GitHub only accepts suggestions on lines in the diff.
const files = await github.paginate(github.rest.pulls.listFiles, { owner, repo, pull_number });
const addedLines = new Map();
for (const f of files) {
if (!f.patch) continue;
const set = new Set();
let newLine = 0;
for (const ln of f.patch.split('\n')) {
const h = ln.match(/^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@/);
if (h) { newLine = parseInt(h[1], 10); continue; }
if (ln.startsWith('+') && !ln.startsWith('+++')) { set.add(newLine); newLine++; }
else if (ln.startsWith('-') && !ln.startsWith('---')) { /* old side only */ }
else { newLine++; }
}
addedLines.set(f.filename, set);
}

// Clear our previous suggestion comments so reruns don't pile up.
const prior = await github.paginate(github.rest.pulls.listReviewComments, { owner, repo, pull_number });
for (const c of prior) {
if (c.body && c.body.includes(MARK)) {
await github.rest.pulls.deleteReviewComment({ owner, repo, comment_id: c.id });
}
}

let posted = 0, skipped = 0;
for (const s of suggestions) {
const filename = `docs/pages/en/${s.rel}`;
if (!addedLines.get(filename)?.has(s.line)) { skipped++; continue; }
const body = `${MARK}\n${s.reason}\n\n\`\`\`suggestion\n${s.fixed}\n\`\`\``;
try {
await github.rest.pulls.createReviewComment({
owner, repo, pull_number, commit_id, path: filename, line: s.line, side: 'RIGHT', body,
});
posted++;
} catch (e) {
core.warning(`Could not suggest on ${filename}:${s.line} — ${e.message}`);
skipped++;
}
}
core.info(`Inline suggestions posted: ${posted}, skipped (not in diff): ${skipped}`);

- name: Fail on blocking issues
if: steps.style.outputs.exit_code != '0'
run: |
echo "Styleguide check found blocking issues. See the PR comment for details."
exit 1
10 changes: 8 additions & 2 deletions .github/workflows/i18n-translate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,11 @@ jobs:
- name: Translate changed pages into cn + ko
if: steps.guard.outputs.skip != 'true' && steps.changed.outputs.pages != ''
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
# Provider/model are swappable here without touching the scripts.
LLM_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
TRANSLATE_MODEL: google/gemini-2.5-flash
MAX_OUTPUT_TOKENS: 32000 # 2.5-flash supports a large output window
# REVIEW_MODEL: google/gemini-2.5-flash # set to enable the QA pass
run: |
pages=$(echo "${{ steps.changed.outputs.pages }}" | tr '\n' ' ')
node docs/lib/i18n-translate.mjs cn $pages
Expand All @@ -98,7 +102,9 @@ jobs:
- name: Regenerate localized sidebars
if: steps.guard.outputs.skip != 'true' && steps.sidebar.outputs.changed == 'true'
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
LLM_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
TRANSLATE_MODEL: google/gemini-2.5-flash
MAX_OUTPUT_TOKENS: 32000
run: |
node docs/lib/i18n-sidebar.mjs cn
node docs/lib/i18n-sidebar.mjs ko
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@ docs/dist/
*.log
.DS_Store
.env

# Generated by `vocs build` from the tracked .ttf sources
docs/public/fonts/*.woff2
18 changes: 18 additions & 0 deletions ADRs/DR002_i18n_Sync_Pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,15 @@ stale.**

### 5. Translation engine (`docs/lib/i18n-translate.mjs`)

> **Superseded by [DR004](./DR004_Translation_LLM_Provider.md).** The provider/
> model below (Anthropic SDK, `claude-opus-4-8`, `ANTHROPIC_API_KEY`) was
> replaced by a swappable OpenAI-compatible seam (`docs/lib/llm.mjs`) defaulting
> to OpenRouter + a cheap model, with an optional review pass and structural
> validation. Use `LLM_API_KEY` and see DR004 for current usage. The discovery
> behavior also changed: explicit paths now skip in-sync pages (drift is always
> checked), with `--force` to override. The rest of this section is retained as
> the original record.

```
ANTHROPIC_API_KEY=... node docs/lib/i18n-translate.mjs <cn|ko> [--stale] [--limit N] [page ...]
```
Expand All @@ -120,6 +129,11 @@ itself, or takes explicit paths.

### 6. Sidebars (`docs/lib/i18n-sidebar.mjs`)

> **Engine superseded by [DR004](./DR004_Translation_LLM_Provider.md):** the
> "batched Opus call" is now the provider-agnostic `complete()` seam (cheap model
> by default, optional review). The structure/regeneration logic below is
> unchanged.

`en` is the source of truth for sidebar *structure* too: only the `/en` section
of `docs/sidebar.json` is hand-maintained. `i18n-sidebar.mjs <cn|ko>` regenerates
the `/cn` and `/ko` sections from `/en` — it deep-clones the tree, swaps each
Expand Down Expand Up @@ -165,6 +179,10 @@ report `0 missing`.

### Step 2 — Translate the missing pages

> The `ANTHROPIC_API_KEY` invocations below are superseded by
> [DR004](./DR004_Translation_LLM_Provider.md) — use `LLM_API_KEY` (the engine
> now calls an OpenAI-compatible provider, OpenRouter by default).

```bash
ANTHROPIC_API_KEY=... node docs/lib/i18n-translate.mjs cn
ANTHROPIC_API_KEY=... node docs/lib/i18n-translate.mjs ko
Expand Down
3 changes: 2 additions & 1 deletion ADRs/DR003_Page_Filename_Index_Constraint.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,8 @@ end in `index` (e.g. `-guides`, `-overview`). Do not reach for `-index` as a
this record plus `docs:build` link-checking: a `sidebar.json` link to the
intended `/foo-index` fails the build, surfacing the mistake. (A lint rule
rejecting non-`index.mdx` files ending in `index` would make this active
rather than incidental — an open follow-up if it recurs.)
rather than incidental — an open follow-up if it recurs. The natural home is
now the styleguide checker, [DR005](./DR005_Styleguide_Enforcement.md).)

## Related

Expand Down
136 changes: 136 additions & 0 deletions ADRs/DR004_Translation_LLM_Provider.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# DR004: Translation LLM provider — swappable seam, cheap model + review

> Supersedes [DR002](./DR002_i18n_Sync_Pipeline.md) §5 (translation engine) and
> §6 (sidebar) **internals only**. DR002's contract — `en` as source of truth,
> the parity checker, the CI gates, block-on-missing/warn-on-stale, the page
> mapping and re-homing — is unchanged and still authoritative. This record
> covers *which model answers and how the call is made*.

## Context

DR002 built the translation engine on `@anthropic-ai/sdk` + **`claude-opus-4-8`**
with adaptive thinking. On a working PR this surfaced three problems:

- **Cost / overkill.** Opus is `$5/$25` per MTok and adaptive thinking bills
thinking tokens — for a *translation* task that needs neither frontier
reasoning nor deep thinking. The work is structure-preserving rewriting, which
mid-tier and cheap models do well.
- **Reliability.** The Opus-backed job hit rate-limit / "credit balance too low"
failures that blocked the whole `i18n-translate` workflow (the entire batch
failed once credits ran out mid-run).
- **No swappability.** Provider, model, and key were hard-coded across two
scripts; trying a cheaper model meant editing code in several places.

A separate, compounding bug was also found and fixed (see Decision §5): the
engine re-translated **every** changed page unconditionally when given explicit
paths, ignoring the `source_sha` freshness check — so each push re-ran the full
changeset through Opus.

The user's decision: move to **OpenRouter** with a cheap model, keep quality
with a **review pass**, and make the LLM **replaceable by configuration, not
code**.

## Decision

### 1. One provider-agnostic seam — `docs/lib/llm.mjs`

All LLM access goes through a single `complete({ model, system, user, maxTokens })`
helper (~30 lines). The two scripts import only this; they never construct a
client or know the wire format.

- **Transport: native `fetch` against the OpenAI Chat Completions format.** No
SDK. The OpenAI `/chat/completions` shape is the lingua franca every major
gateway speaks (OpenRouter, OpenAI, Together, Groq, local vLLM/Ollama), so one
function is portable across all of them. **`@anthropic-ai/sdk` was removed**
from `package.json` (no remaining importers).
- A small retry (2×, backoff) on `429`/`5xx`/network replaces the SDK's
auto-retry. Non-streaming for simplicity (cheap models return in seconds; Node
`fetch` has no client timeout).

### 2. Everything is env-configurable

| Env var | Default | Purpose |
| --- | --- | --- |
| `LLM_API_KEY` | — (falls back to `OPENROUTER_API_KEY`) | Bearer key (required) |
| `LLM_BASE_URL` | `https://openrouter.ai/api/v1` | Any OpenAI-compatible endpoint |
| `TRANSLATE_MODEL` | `google/gemini-2.5-flash` | Translation-pass model |
| `REVIEW_MODEL` | *(unset → no review)* | Optional QA-pass model |
| `MAX_OUTPUT_TOKENS` | `8000` | Per-call output cap (must fit the model) |

Swapping provider or model is a workflow `env:` change or a shell var — **zero
code edits**. `.github/workflows/i18n-translate.yml` wires the repo secret
`OPENROUTER_API_KEY` into `LLM_API_KEY` and pins `TRANSLATE_MODEL` +
`MAX_OUTPUT_TOKENS: 32000` (gemini-2.5-flash has a large output window).

### 3. Translate → optional review pipeline

`translateOne()` calls `complete()` with `TRANSLATE_MODEL`; if `REVIEW_MODEL` is
set, it makes a second `complete()` call that hands the model the English source
+ the draft and asks it to correct mistranslations and structural drift. Same
helper, called twice — no pipeline framework. The review pass is the cost/quality
dial: off by default, on by setting one env var.

### 4. Deterministic guards (the real safety net)

Cheap models will occasionally damage structure, so output is validated **before
it is written**:

- **Structural validation** — frontmatter block present, fenced-code-block count
matches the source, no stray ```` ``` ```` wrapping the file. On failure the
page throws and is skipped (surfaced in the run) rather than written.
- **Robust link normalization** — `localizeLinks` no longer only rewrites
`](/en/`. Models "localize" hrefs themselves (emitting `/zh-CN/`, `/ko-KR/`, or
a duplicated section), so it now rewrites **any** locale/lang prefix
(`en|cn|ko|zh|zh-cn|ko-kr`) to the target slug and collapses a duplicated
section segment — touching only the locale prefix, leaving `/images/...` and
external links alone. This fixed 65 deadlinks that failed the build on the
first cheap-model run. The no-API `--relink` pass applies the same fix to
already-generated files.

### 5. Mismatch-only translation (bug fix)

`discover()` previously returned explicit paths verbatim, bypassing the
`source_sha` freshness gate — so CI (which passes every changed en page
explicitly) re-translated the whole PR changeset on every push. Now explicit
paths flow through the same missing-or-drifted check used by a full sweep
(drift is always checked for explicit paths; gated behind `--stale` for a sweep),
with `--force` to re-translate regardless. An in-sync page is now a no-op.

## Consequences

- **~50× cheaper per token** (gemini-2.5-flash ≈ `$0.10/$0.40` vs Opus `$5/$25`)
and no billed thinking tokens, even with the review pass on.
- **Provider/model are config.** Proven in practice: the first model id 404'd on
OpenRouter and the fix was a one-line env change, no code.
- **Quality is guarded by deterministic checks + an optional second model**, not
by the translator's raw fidelity — necessary, because the cheap model did
mangle link hrefs on the first run (caught and fixed).
- **Lost the SDK's auto-retry**; replaced by the small retry in `complete()` plus
the existing per-page try/catch and mismatch-only reruns.
- A first full run still costs real (small) money and time (88 pages × 2 locales,
~13 min on gemini-2.5-flash) — it is CI-triggered on en changes, drafting only
the deltas, exactly as in DR002.

## Verification

```bash
# Single page (provider/model from env; --force ignores the freshness skip)
LLM_API_KEY=… TRANSLATE_MODEL=google/gemini-2.5-flash \
node docs/lib/i18n-translate.mjs cn how-to/use-faucet --force
```

- Inspect the output: frontmatter + `source_sha` intact, code untouched, links
re-prefixed to `/cn/`, prose translated.
- Swappability: re-run with a different `TRANSLATE_MODEL` / `LLM_BASE_URL` — works
with no code change. In-sync page → `translating 0 page(s)`.
- Enable review: set `REVIEW_MODEL` and confirm the corrected output.
- `npm run i18n:check` → `0 missing`; `npm run docs:build` → 0 deadlinks.

### What "passing" looks like

| Check | Pass condition |
| --- | --- |
| single-page run | valid MDX, structure preserved, links `/<locale>/`, `source_sha` stamped |
| in-sync page | `translating 0 page(s)` (no API call) |
| `docs:build` | 0 deadlinks across en/cn/ko |
| model/provider swap | env-only, no code change |
Loading
Loading