Skip to content

fix(extraction): convert remaining fixed-size stacks in extract_channels.c to growable TSNodeStack#339

Closed
jjoos wants to merge 2 commits into
DeusData:mainfrom
jjoos:fix/remaining-fixed-stacks
Closed

fix(extraction): convert remaining fixed-size stacks in extract_channels.c to growable TSNodeStack#339
jjoos wants to merge 2 commits into
DeusData:mainfrom
jjoos:fix/remaining-fixed-stacks

Conversation

@jjoos
Copy link
Copy Markdown
Contributor

@jjoos jjoos commented May 11, 2026

Problem

PR #217 converted most AST traversal functions to use the new growable TSNodeStack (arena-allocated, no hard cap), but extract_channels.c still had 8 functions using fixed-size TSNode stack[CHAN_STACK_CAP] arrays:

  • scan_string_consts_python
  • extract_channels_python
  • extract_channels_go
  • extract_channels_java
  • extract_channels_csharp
  • extract_channels_ruby
  • extract_channels_elixir
  • extract_channels_rust

When indexing large repositories with deep AST nesting (e.g. 4096+ nodes on a single DFS path), these arrays overflow silently — the capacity guard && top < CHAN_STACK_CAP drops children without warning, and when the stack itself overflows it corrupts adjacent heap memory. The corruption surfaces as a segmentation fault in a later pass (observed in lsp_cross and semantic_edges passes on large repos).

Fix

Apply the same mechanical conversion used in PR #217 to the 8 remaining functions:

Before After
TSNode stack[CHAN_STACK_CAP]; int top = 0; TSNodeStack stack; ts_nstack_init(&stack, ctx->arena, CHAN_STACK_CAP);
stack[top++] = ctx->root ts_nstack_push(&stack, ctx->arena, ctx->root)
while (top > 0) while (stack.count > 0)
TSNode node = stack[--top] TSNode node = ts_nstack_pop(&stack)
&& top < CHAN_STACK_CAP capacity guard removed (arena grows automatically)
stack[top++] = ts_node_child(...) ts_nstack_push(&stack, ctx->arena, ts_node_child(...))

Testing

Verified by indexing large repositories (1290 files, 13,735 nodes; 3482+ functions) — previously these repos caused segfaults, with this fix all repos index cleanly.

jjoos added 2 commits May 11, 2026 22:52
…els.c to growable TSNodeStack

PR DeusData#217 converted most extractors to growable arena-allocated TSNodeStack,
but extract_channels.c still had 8 functions using fixed-size
`TSNode stack[CHAN_STACK_CAP]` arrays. On large codebases with 4096+ AST
nodes on a single DFS path, these arrays overflow silently, corrupting
adjacent heap memory and causing segfaults in later passes.

Apply the same mechanical conversion used in PR DeusData#217:
  - `TSNode stack[CHAN_STACK_CAP]; int top = 0;` -> `TSNodeStack stack; ts_nstack_init(...);`
  - `stack[top++] = root` -> `ts_nstack_push(&stack, ctx->arena, root)`
  - `while (top > 0)` -> `while (stack.count > 0)`
  - `TSNode node = stack[--top]` -> `TSNode node = ts_nstack_pop(&stack)`
  - Remove `&& top < CHAN_STACK_CAP` capacity guard from push loops

Affected functions: scan_string_consts_python, extract_channels_python,
extract_channels_go, extract_channels_java, extract_channels_csharp,
extract_channels_ruby, extract_channels_elixir, extract_channels_rust.

Fixes segmentation faults when indexing large repositories.
…nsts_python

The while condition in scan_string_consts_python had an extra
`&& tbl->count < CHAN_CONST_CAP` clause that wasn't caught by the
previous conversion pass. Convert it properly.
@DeusData
Copy link
Copy Markdown
Owner

Thank you, @jjoos! 🙏 This is a great catch and a clean follow-through on the PR #217 work. You're exactly right about the failure mode: the && top < CHAN_STACK_CAP guard silently dropped children on deep ASTs (so channels would go undetected), and an overflow could corrupt adjacent stack memory and surface as a segfault in a later pass — which is maddening to debug. Converting all 8 walkers to the arena-backed growable TSNodeStack is the right fix and keeps them consistent with extract_calls.c / extract_imports.c.

I landed it as 4d84406, crediting you as the author. The only change I made on top of your patch was trimming a stray trailing blank line at EOF so the file stays clang-format-clean — the conversion itself is entirely yours. Verified locally: build clean, all 3,617 tests pass. This also feeds into the stability cluster in #390. Thanks for tightening this up! 🙏

DeusData pushed a commit that referenced this pull request May 30, 2026
…rowable TSNodeStack

The 8 channel-extraction DFS walkers (Python consts, Python, Go, Java,
C#, Ruby, Elixir, Rust) used fixed-size `TSNode stack[CHAN_STACK_CAP]`
arrays whose push loops guarded with `&& top < CHAN_STACK_CAP`. On
deeply-nested ASTs that guard silently dropped children, and an
unguarded overflow could corrupt adjacent stack memory — surfacing as a
segfault in a later pass (lsp_cross / semantic_edges) on large repos.

Convert all 8 to the arena-backed growable TSNodeStack
(ts_nstack_init/push/pop), matching the pattern already used in
extract_calls.c / extract_imports.c (PR #217). No silent drops, no
overflow.

Distilled from #339 (trailing-newline cleanup only; conversion is the
author's).
@DeusData DeusData closed this May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants