Skip to content

Commit 1b6333e

Browse files
jahoomaclaude
andcommitted
Add --smoke-tree-sitter flag and fail builds with empty embed
Freebuff 0.0.64 still crashed for users with the same wasm error even though it was built from a commit that contained the base64 embed. The runtime stack trace pointed at the path-resolution fallback in init-node.ts:76, meaning the embed didn't reach the SDK bundle's globalThis check at runtime — the binary fell through to fs.existsSync which never works on Windows bunfs paths. Two hardening passes so this can't ship silently again: - cli/src/pre-init/tree-sitter-wasm.ts: hidden `--smoke-tree-sitter` flag, handled in the very first import. Calls Parser.init({ wasmBinary }) directly with the embedded base64 and exits 0/1. Lives here (not commander) on purpose — it tests *the embed*, not the broader init path that has a path-resolution fallback that would mask a broken embed by passing in dev mode. - cli/scripts/build-binary.ts: post-bun-compile, scan the output binary for the wasm's base64 prefix. Build fails if the bytes didn't actually make it through bundling (e.g. bun dropping a huge string literal, bundle cache reading a stale empty stub). Always-on log of which path the wasm was resolved from so CI logs make the embed step diagnosable. More resilient resolve: search workspace root, cli/node_modules, and sdk/node_modules before falling back to createRequire — Windows CI's `bun install --cwd cli` lays out web-tree-sitter differently than a hoisted root install. - packages/code-map/src/init-node.ts: accept bunfs paths (`/~BUN/root/...`) without an fs.existsSync check. fs.existsSync inconsistently returns false for bun --compile asset paths on Windows even though the runtime can read them, so the existing path-resolution fallback was permanently broken on Windows. Belt-and-braces: this makes the fallback work even if the embed step regresses. - cli/scripts/smoke-binary.ts: run --smoke-tree-sitter as a deterministic pre-check before the long-window boot smoke. A broken embed fails fast with a clear "exit code 1, no boot ok marker" error instead of a 10s timeout that depends on render-loop timing. Verified locally: build embeds 205KB wasm as 274KB base64, post-build verification finds the prefix in the compiled binary, --smoke-tree-sitter exits 0 with "tree-sitter smoke ok", full smoke passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 6b3dcd1 commit 1b6333e

4 files changed

Lines changed: 169 additions & 19 deletions

File tree

cli/scripts/build-binary.ts

Lines changed: 83 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -145,10 +145,10 @@ async function main() {
145145
patchOpenTuiAssetPaths()
146146
await ensureOpenTuiNativeBundle(targetInfo)
147147

148-
const restoreTreeSitterWasmStub = embedTreeSitterWasmAsBase64()
148+
const treeSitterEmbed = embedTreeSitterWasmAsBase64()
149149
// Restore the stub even on build failure so a developer's git working
150150
// tree doesn't end up with a multi-megabyte modified file.
151-
process.on('exit', restoreTreeSitterWasmStub)
151+
process.on('exit', treeSitterEmbed.restore)
152152

153153
const outputFilename =
154154
targetInfo.platform === 'win32' ? `${binaryName}.exe` : binaryName
@@ -194,7 +194,17 @@ async function main() {
194194
// Build done — restore the stub so a developer's working tree doesn't show
195195
// a multi-megabyte diff. (The exit handler above is a backstop for crashes;
196196
// the eager call here keeps a successful build clean.)
197-
restoreTreeSitterWasmStub()
197+
treeSitterEmbed.restore()
198+
199+
// Fail the build if the wasm bytes didn't actually make it into the
200+
// compiled binary. Catches silent regressions (e.g. bun dropping a huge
201+
// string literal, or some future bundler optimization) before we ship a
202+
// broken artifact to users.
203+
verifyTreeSitterWasmEmbedded(
204+
outputFile,
205+
treeSitterEmbed.wasmBase64Prefix,
206+
treeSitterEmbed.wasmByteLength,
207+
)
198208

199209
if (targetInfo.platform !== 'win32') {
200210
chmodSync(outputFile, 0o755)
@@ -225,7 +235,11 @@ main().catch((error: unknown) => {
225235
* Returns a function that restores the stub. Always invoke it (success or
226236
* failure) so a developer's working tree doesn't show a multi-MB diff.
227237
*/
228-
function embedTreeSitterWasmAsBase64(): () => void {
238+
function embedTreeSitterWasmAsBase64(): {
239+
restore: () => void
240+
wasmBase64Prefix: string
241+
wasmByteLength: number
242+
} {
229243
const stubPath = join(cliRoot, 'src', 'pre-init', 'tree-sitter-wasm-bytes.ts')
230244
const originalStub = readFileSync(stubPath, 'utf8')
231245
let restored = false
@@ -239,11 +253,30 @@ function embedTreeSitterWasmAsBase64(): () => void {
239253
}
240254
}
241255

242-
// Resolve from the CLI workspace so monorepo hoisting differences don't
243-
// matter — `web-tree-sitter` is an SDK dep, but the CLI imports it
244-
// transitively and the bundler walks it from here.
245-
const cliRequire = createRequire(join(cliRoot, 'package.json'))
246-
const wasmPath = cliRequire.resolve('web-tree-sitter/tree-sitter.wasm')
256+
// Try multiple candidate locations because bun's hoisting differs by
257+
// platform and install command — Windows CI does `bun install --cwd cli`
258+
// which can leave web-tree-sitter in cli/node_modules, while monorepo
259+
// root installs hoist it to ../node_modules. Fall back to createRequire
260+
// last so any failure surfaces with the full search trail.
261+
const candidates = [
262+
join(cliRoot, 'node_modules', 'web-tree-sitter', 'tree-sitter.wasm'),
263+
join(cliRoot, '..', 'node_modules', 'web-tree-sitter', 'tree-sitter.wasm'),
264+
join(cliRoot, '..', 'sdk', 'node_modules', 'web-tree-sitter', 'tree-sitter.wasm'),
265+
]
266+
let wasmPath = candidates.find((p) => existsSync(p))
267+
if (!wasmPath) {
268+
try {
269+
const cliRequire = createRequire(join(cliRoot, 'package.json'))
270+
wasmPath = cliRequire.resolve('web-tree-sitter/tree-sitter.wasm')
271+
} catch (err) {
272+
throw new Error(
273+
`Could not locate web-tree-sitter/tree-sitter.wasm. Searched:\n - ` +
274+
candidates.join('\n - ') +
275+
`\nAnd createRequire failed: ${err instanceof Error ? err.message : String(err)}`,
276+
)
277+
}
278+
}
279+
247280
const wasmBytes = readFileSync(wasmPath)
248281
const base64 = wasmBytes.toString('base64')
249282

@@ -254,8 +287,47 @@ function embedTreeSitterWasmAsBase64(): () => void {
254287
`export const TREE_SITTER_WASM_BASE64 = ${JSON.stringify(base64)}\n`
255288

256289
writeFileSync(stubPath, generated)
257-
log(`Embedded tree-sitter.wasm (${wasmBytes.length} bytes → ${base64.length} chars base64)`)
258-
return restore
290+
// Always-on log (not behind VERBOSE) so CI shows which path was used and
291+
// whether the embed succeeded — this is the single most useful breadcrumb
292+
// when the runtime check fails on a user machine.
293+
logAlways(
294+
`Embedded tree-sitter.wasm from ${wasmPath} (${wasmBytes.length} bytes → ${base64.length} chars base64)`,
295+
)
296+
return {
297+
restore,
298+
wasmBase64Prefix: base64.slice(0, 40),
299+
wasmByteLength: wasmBytes.length,
300+
}
301+
}
302+
303+
/**
304+
* Sanity-check the compiled binary actually contains the embedded base64.
305+
* If bun --compile ever silently drops a large string literal, or our embed
306+
* step's file write didn't take effect before the bundle ran, we want the
307+
* build to fail here instead of producing a binary that crashes for users.
308+
*/
309+
function verifyTreeSitterWasmEmbedded(
310+
outputFile: string,
311+
wasmBase64Prefix: string,
312+
wasmByteLength: number,
313+
): void {
314+
const binary = readFileSync(outputFile)
315+
// Search as a Buffer so we don't have to load the whole binary as a UTF-8
316+
// string (binaries are not valid UTF-8 and toString would corrupt bytes).
317+
const needle = Buffer.from(wasmBase64Prefix, 'utf8')
318+
const idx = binary.indexOf(needle)
319+
if (idx === -1) {
320+
throw new Error(
321+
`Embedded tree-sitter wasm prefix not found in ${outputFile}.\n` +
322+
`Expected base64 prefix (first 40 chars): ${wasmBase64Prefix}\n` +
323+
`Original wasm size: ${wasmByteLength} bytes.\n` +
324+
`This means the build-binary.ts embed step ran but bun --compile\n` +
325+
`did not include the bytes in the output. The runtime smoke test\n` +
326+
`would fall back to path-based wasm resolution, which is broken on\n` +
327+
`Windows.`,
328+
)
329+
}
330+
logAlways(`Verified embedded wasm prefix at offset ${idx} of compiled binary.`)
259331
}
260332

261333
function patchOpenTuiAssetPaths() {

cli/scripts/smoke-binary.ts

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,39 @@ const FATAL_PATTERNS = [
8181
// the renderer is up).
8282
const DEFAULT_RUN_SECONDS = 10
8383

84+
function runTreeSitterSmoke(binary: string): Promise<void> {
85+
return new Promise((resolve, reject) => {
86+
const proc = spawn(binary, ['--smoke-tree-sitter'], {
87+
stdio: ['ignore', 'pipe', 'pipe'],
88+
env: { ...process.env, NO_COLOR: '1', TERM: 'dumb' },
89+
})
90+
91+
let captured = ''
92+
const append = (chunk: Buffer): void => {
93+
captured += chunk.toString('utf8')
94+
}
95+
proc.stdout?.on('data', append)
96+
proc.stderr?.on('data', append)
97+
98+
proc.once('error', reject)
99+
proc.once('exit', (code) => {
100+
if (code === 0 && /tree-sitter smoke ok/.test(captured)) {
101+
resolve()
102+
return
103+
}
104+
105+
reject(
106+
new Error(
107+
`tree-sitter smoke failed with exit code ${code}\n${captured.slice(
108+
0,
109+
8 * 1024,
110+
)}`,
111+
),
112+
)
113+
})
114+
})
115+
}
116+
84117
async function main(): Promise<void> {
85118
const binary = process.argv[2]
86119
const runSeconds = Number(process.argv[3] ?? DEFAULT_RUN_SECONDS)
@@ -100,6 +133,9 @@ async function main(): Promise<void> {
100133

101134
console.log(`smoke-binary: spawning ${binary} for ${runSeconds}s…`)
102135

136+
await runTreeSitterSmoke(binary)
137+
console.log('smoke-binary: tree-sitter init OK.')
138+
103139
const proc = spawn(binary, [], {
104140
stdio: ['ignore', 'pipe', 'pipe'],
105141
env: { ...process.env, NO_COLOR: '1', TERM: 'dumb' },

cli/src/pre-init/tree-sitter-wasm.ts

Lines changed: 41 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,18 +18,54 @@
1818

1919
import { TREE_SITTER_WASM_BASE64 } from './tree-sitter-wasm-bytes'
2020

21+
let embeddedWasm: Uint8Array | undefined
2122
if (TREE_SITTER_WASM_BASE64.length > 0) {
2223
const buf = Buffer.from(TREE_SITTER_WASM_BASE64, 'base64')
24+
embeddedWasm = new Uint8Array(buf.buffer, buf.byteOffset, buf.byteLength)
2325
// globalThis is the only cross-bundle channel: the SDK pre-built bundle
2426
// inlines its own copy of `init-node.ts`, so a module-level variable in
2527
// the source package isn't visible to the singleton initialized via the
2628
// SDK. Slice into a fresh Uint8Array view instead of handing over the
2729
// Buffer's shared underlying ArrayBuffer.
2830
;(
2931
globalThis as { __CODEBUFF_TREE_SITTER_WASM_BINARY__?: Uint8Array }
30-
).__CODEBUFF_TREE_SITTER_WASM_BINARY__ = new Uint8Array(
31-
buf.buffer,
32-
buf.byteOffset,
33-
buf.byteLength,
34-
)
32+
).__CODEBUFF_TREE_SITTER_WASM_BINARY__ = embeddedWasm
33+
}
34+
35+
// Deterministic CI gate: `<binary> --smoke-tree-sitter` proves the embed
36+
// shipped end-to-end. Lives here, in the very first import, on purpose:
37+
//
38+
// - We're testing whether the *embed* works. Going through commander +
39+
// initTreeSitterForNode would also pass via the path-resolution
40+
// fallback when the embed is empty (e.g. dev mode), giving false
41+
// positives that mask a broken production build.
42+
// - Failing here, before any other module loads, gives a sharp signal:
43+
// the embed either worked or it didn't. No render-loop timing, no
44+
// commander wiring, no SDK init order to debug.
45+
//
46+
// Async IIFE because Parser.init returns a promise; process.exit tears
47+
// the process down before parallel top-level imports can fire side
48+
// effects we'd have to clean up.
49+
if (process.argv.includes('--smoke-tree-sitter')) {
50+
void (async () => {
51+
try {
52+
if (!embeddedWasm) {
53+
console.error(
54+
'tree-sitter smoke FAIL: TREE_SITTER_WASM_BASE64 stub is empty — ' +
55+
'the build-binary.ts embed step did not run or did not write the file.',
56+
)
57+
process.exit(1)
58+
}
59+
const { Parser } = await import('web-tree-sitter')
60+
await Parser.init({ wasmBinary: embeddedWasm })
61+
// Marker grepped by cli/scripts/smoke-binary.ts — keep this exact text.
62+
console.log(
63+
`tree-sitter smoke ok (${embeddedWasm.byteLength} bytes wasm initialized)`,
64+
)
65+
process.exit(0)
66+
} catch (err) {
67+
console.error('tree-sitter smoke FAIL:', err)
68+
process.exit(1)
69+
}
70+
})()
3571
}

packages/code-map/src/init-node.ts

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,14 +30,20 @@ function getEmbeddedWasmBinary(): Uint8Array | undefined {
3030
)[WASM_BINARY_GLOBAL_KEY]
3131
}
3232

33+
function isBunEmbeddedPath(filePath: string): boolean {
34+
return filePath.replace(/\\/g, '/').includes('/~BUN/root/')
35+
}
36+
3337
function resolveTreeSitterWasm(scriptDir: string): string {
3438
const override = process.env[TREE_SITTER_WASM_ENV_VAR]
35-
if (override && fs.existsSync(override)) {
36-
return override
39+
if (override) {
40+
if (fs.existsSync(override) || isBunEmbeddedPath(override)) {
41+
return override
42+
}
3743
}
3844

3945
const fallback = path.join(scriptDir, 'tree-sitter.wasm')
40-
if (fs.existsSync(fallback)) {
46+
if (fs.existsSync(fallback) || isBunEmbeddedPath(fallback)) {
4147
return fallback
4248
}
4349

0 commit comments

Comments
 (0)