fix(extraction): convert remaining fixed-size stacks in extract_channels.c to growable TSNodeStack#339
fix(extraction): convert remaining fixed-size stacks in extract_channels.c to growable TSNodeStack#339jjoos wants to merge 2 commits into
Conversation
…els.c to growable TSNodeStack PR DeusData#217 converted most extractors to growable arena-allocated TSNodeStack, but extract_channels.c still had 8 functions using fixed-size `TSNode stack[CHAN_STACK_CAP]` arrays. On large codebases with 4096+ AST nodes on a single DFS path, these arrays overflow silently, corrupting adjacent heap memory and causing segfaults in later passes. Apply the same mechanical conversion used in PR DeusData#217: - `TSNode stack[CHAN_STACK_CAP]; int top = 0;` -> `TSNodeStack stack; ts_nstack_init(...);` - `stack[top++] = root` -> `ts_nstack_push(&stack, ctx->arena, root)` - `while (top > 0)` -> `while (stack.count > 0)` - `TSNode node = stack[--top]` -> `TSNode node = ts_nstack_pop(&stack)` - Remove `&& top < CHAN_STACK_CAP` capacity guard from push loops Affected functions: scan_string_consts_python, extract_channels_python, extract_channels_go, extract_channels_java, extract_channels_csharp, extract_channels_ruby, extract_channels_elixir, extract_channels_rust. Fixes segmentation faults when indexing large repositories.
…nsts_python The while condition in scan_string_consts_python had an extra `&& tbl->count < CHAN_CONST_CAP` clause that wasn't caught by the previous conversion pass. Convert it properly.
|
Thank you, @jjoos! 🙏 This is a great catch and a clean follow-through on the PR #217 work. You're exactly right about the failure mode: the I landed it as 4d84406, crediting you as the author. The only change I made on top of your patch was trimming a stray trailing blank line at EOF so the file stays clang-format-clean — the conversion itself is entirely yours. Verified locally: build clean, all 3,617 tests pass. This also feeds into the stability cluster in #390. Thanks for tightening this up! 🙏 |
…rowable TSNodeStack The 8 channel-extraction DFS walkers (Python consts, Python, Go, Java, C#, Ruby, Elixir, Rust) used fixed-size `TSNode stack[CHAN_STACK_CAP]` arrays whose push loops guarded with `&& top < CHAN_STACK_CAP`. On deeply-nested ASTs that guard silently dropped children, and an unguarded overflow could corrupt adjacent stack memory — surfacing as a segfault in a later pass (lsp_cross / semantic_edges) on large repos. Convert all 8 to the arena-backed growable TSNodeStack (ts_nstack_init/push/pop), matching the pattern already used in extract_calls.c / extract_imports.c (PR #217). No silent drops, no overflow. Distilled from #339 (trailing-newline cleanup only; conversion is the author's).
Problem
PR #217 converted most AST traversal functions to use the new growable
TSNodeStack(arena-allocated, no hard cap), butextract_channels.cstill had 8 functions using fixed-sizeTSNode stack[CHAN_STACK_CAP]arrays:scan_string_consts_pythonextract_channels_pythonextract_channels_goextract_channels_javaextract_channels_csharpextract_channels_rubyextract_channels_elixirextract_channels_rustWhen indexing large repositories with deep AST nesting (e.g. 4096+ nodes on a single DFS path), these arrays overflow silently — the capacity guard
&& top < CHAN_STACK_CAPdrops children without warning, and when the stack itself overflows it corrupts adjacent heap memory. The corruption surfaces as a segmentation fault in a later pass (observed inlsp_crossandsemantic_edgespasses on large repos).Fix
Apply the same mechanical conversion used in PR #217 to the 8 remaining functions:
TSNode stack[CHAN_STACK_CAP]; int top = 0;TSNodeStack stack; ts_nstack_init(&stack, ctx->arena, CHAN_STACK_CAP);stack[top++] = ctx->rootts_nstack_push(&stack, ctx->arena, ctx->root)while (top > 0)while (stack.count > 0)TSNode node = stack[--top]TSNode node = ts_nstack_pop(&stack)&& top < CHAN_STACK_CAPcapacity guardstack[top++] = ts_node_child(...)ts_nstack_push(&stack, ctx->arena, ts_node_child(...))Testing
Verified by indexing large repositories (1290 files, 13,735 nodes; 3482+ functions) — previously these repos caused segfaults, with this fix all repos index cleanly.