Optimize vector iteration to O(n) and add short-circuiting any?/all?#10
Optimize vector iteration to O(n) and add short-circuiting any?/all?#10carpentry-agent[bot] wants to merge 1 commit into
Conversation
Replace O(n log n) per-index get in vector reduce/each/map/filter with O(n) leaf-by-leaf DFS over the radix trie, then linear tail scan. Add short-circuiting any?/all? to all data structures: - List, queue, deque: direct linked-list walk with early exit - Trie: iterative DFS with path reconstruction and early exit - Ordered map/set: in-order traversal stack with early exit - Vector: DFS tree leaves + tail with early exit - Heap, hash map, hash set: reduce-based with or/and short-circuit (borrow checker prevents stack-based DFS short-circuiting here)
There was a problem hiding this comment.
Build & Tests
Build passes. All 227 tests pass across 9 suites (list:28, queue:26, deque:24, ord-map:35, ord-set:23, heap:16, hash-map:28, hash-set:22, vector:25). CI green on both macOS and Ubuntu.
Trie tests crash on my local ARM platform (pre-existing stack overflow in get-in-siblings on main, not introduced by this PR).
Findings
Vector DFS correctness (confirmed)
The DFS iteration is sound. Branch children are pushed right-to-left so they pop left-to-right — standard iterative DFS reversal. Leaves are scanned forward, then the tail is handled separately. Complexity is O(n) instead of the prior O(n log n). Verified edge cases: empty vector, tail-only (<=32 elements), first overflow at 33, and the 50-element multi-leaf tests in the suite.
Doc inconsistency in hash map/hash set any?/all? (should fix)
persistent.carp:3156-3174 (hash map) and persistent.carp:3300-3312 (hash set) — the doc strings say "Short-circuits on first match/mismatch" but the implementation is reduce-based. All nodes are still visited; only the predicate is skipped via or/and. The heap's doc is honest about this: "Short-circuits predicate evaluation via or." The hash map and hash set docs should use the same hedged wording, or implement true short-circuiting.
Minor inefficiency in ordered map/set any?/all? (not a blocker)
persistent.carp:2122-2123 — after found is set to true, the code still copies the right subtree reference (set! current @(%node-right &node)) before the outer while-do guard exits. One extra Rc copy per short-circuit exit. Harmless, but easy to avoid with an (if (not found) ...) guard if desired.
Short-circuit implementations are correct
- List, queue, deque, trie, ordered map/set, vector: genuine short-circuit with explicit loop guards.
- Heap, hash map, hash set: reduce-based with
or/andmacro short-circuit (predicate skipped after match). The PR description is honest about the borrow-checker constraint here.
Test coverage
Good overall. 4 new multi-leaf vector tests (50 elements), any?/all? for every data structure.
Missing: a test at exactly 32 elements (tail-full, no tree — the boundary case). Current tests jump from 3 to 50 elements.
Cosmetic reformatting
The diff mixes algorithmic changes with reformatting of existing code (hash map, trie, etc.). Makes the PR harder to review. The reformatting itself is fine.
Verdict: revise
The algorithmic changes are correct and a genuine performance improvement. Fix the hash map/hash set doc strings to accurately describe the short-circuit behavior (matching the heap's wording), and this is ready to merge.
Summary
Vector reduce/each/map/filter: replaced O(n log n) per-index
getcalls with O(n) leaf-by-leaf DFS over the radix trie, followed by a linear tail scan. This is the main performance win — the old approach calledget(O(log n)) for each of the n elements.Short-circuiting any?/all? across all data structures:
or/andshort-circuit (Carp's borrow checker prevents stack-based DFS short-circuiting for these structures due to ref lifetime issues inwhile-dowith control flow macros)Test plan
get-in-siblingson main, not introduced by this change)carp-fmtcleananglerclean (remaining findings are all in pre-existing code)Opened by the carpentry-org heartbeat agent (Claude). Veit has not reviewed this yet.