Skip to content

fix(extraction): replace fixed traversal stacks with growable arena-allocated stacks#217

Draft
AhmedHamadto wants to merge 1 commit intoDeusData:mainfrom
AhmedHamadto:fix/growable-traversal-stack-199
Draft

fix(extraction): replace fixed traversal stacks with growable arena-allocated stacks#217
AhmedHamadto wants to merge 1 commit intoDeusData:mainfrom
AhmedHamadto:fix/growable-traversal-stack-199

Conversation

@AhmedHamadto
Copy link
Copy Markdown

Problem

AST traversal functions in the extraction layer use fixed-size TSNode stack[N] arrays. When the DFS stack fills up, the child-push loop exits silently via && top < CAP, dropping entire subtrees without warning.

This affects files with many top-level siblings (e.g., 600 ES imports, 130 Express routes). Reported in #199 — a Node.js routes file with ~130 routes lost data past the 512 cap.

Confirmed: extracting 600 TypeScript imports returns 511 (capped at ES_IMPORT_STACK_CAP = 512).

Fix

Added ts_node_stack.h — a 53-line growable stack backed by the existing arena allocator. Initial capacity matches the previous fixed caps (no extra memory for small files), but doubles on overflow instead of truncating.

Applied to all 14 TSNode fixed stacks across 9 extraction files:

File Functions fixed Old cap
extract_calls.c walk_calls 512
extract_imports.c walk_es_imports, walk_wolfram_imports 512
extract_semantic.c walk_throws, walk_readwrites 512
extract_defs.c extract_elixir_call, walk_variables_iter, push_nested_class_nodes 64–256
extract_usages.c walk_usages 4096
extract_env_accesses.c walk_env_accesses 4096
extract_type_refs.c walk_body_type_refs, walk_type_refs 4096
extract_type_assigns.c walk_type_assigns 4096
extract_channels.c scan_string_consts, cbm_extract_channels 4096

Not changed: walk_defs uses walk_defs_frame_t (different struct type), left as-is with its 4096 cap.

Tests

Added test_stack_overflow.c with 7 regression tests:

Test Before fix After fix
ts_imports_exceed_512 511/600 FAIL 600/600 PASS
js_calls_exceed_512 600/600 600/600
python_calls_exceed_512 600/600 600/600
go_calls_exceed_1024 1024/1024 1024/1024
express_routes_exceed_512 150/150 150/150
js_deeply_nested_calls 200/200 200/200
yaml_vars_exceed_256 300/300 300/300

Full suite: 2,727 passed, 0 failed.

Design notes

  • Arena-backed: old arrays are abandoned in the arena (freed on arena_destroy at end of file extraction). No realloc, no mixed lifetimes.
  • Same initial capacity as before — zero extra allocation for files within old caps.
  • Net -2 lines (removed 15 #define caps, added growable stack header).
  • clang-tidy/cppcheck not available on build machine — CI will validate.

Fixes #199

…llocated stacks

AST traversal functions use fixed-size TSNode stack[] arrays. When the
DFS stack fills up, the child-push loop exits silently, dropping entire
subtrees without warning. Confirmed: extracting 600 TypeScript imports
returns 511 (capped at ES_IMPORT_STACK_CAP = 512).

Added ts_node_stack.h — a growable stack backed by the existing arena
allocator. Initial capacity matches previous fixed caps (zero extra
memory for small files), doubles on overflow instead of truncating.

Applied to all 14 TSNode fixed stacks across 9 extraction files.
Not changed: walk_defs (uses walk_defs_frame_t, different struct type).

7 regression tests added. Full suite: 2727 passed, 0 failed.

Fixes DeusData#199
@DeusData
Copy link
Copy Markdown
Owner

DeusData commented Apr 7, 2026

Hey, thanks for publishing this PR. I will tackle this immediately after the 15th of April :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AST Nodes silently dropped in walk_calls when there are more than CBM_SZ_512

2 participants