Skip to content

Sort_by providing zero-copy views (and more)#666

Merged
FrancescAlted merged 15 commits into
mainfrom
sort-by-views
Jun 21, 2026
Merged

Sort_by providing zero-copy views (and more)#666
FrancescAlted merged 15 commits into
mainfrom
sort-by-views

Conversation

@FrancescAlted

Copy link
Copy Markdown
Member

In this PR:

  1. CTable sort_by with zero-copy views (the core feature)
  • c02b489 Preliminary support for views during sort_by — sorting returns a lightweight view streamed from the index instead of materializing a reordered table.
  • 77fa652 Optimized sorting for slices and numeric + timestamp + bool columns.
  • e83d263 FULL-index sort/window support extended to string columns.
  1. Index-backed fast min/max envelopes
  • a74f089 / 719827a Fast min/max envelope plots from a FULL index, then from any index kind.
  • f64bd2a Accelerate general min/max in Columns when indexes are available.
  • c1e36eb Release cached index handles when a table closes (resource cleanup).
  1. b2view — sort mode + plotting UX
  • 2289a9d Add sort-by-indexed-column mode (S) to b2view; e8dd1ad improves navigation within it.
  • 94fe07d Plot +/- zoom now anchors on the left edge (keeps the start point) instead of the centre.
  • 144e33f Acceleration path for envelopes of sorted columns — when plotting the column the view is sorted by, reads only bucket boundaries (~2000 points) instead of gathering all rows. ~50× faster, bit-exact (2.5s → 0.05s on the 24M-row chicago-taxi column); also covers zoom/pan sub-ranges.
  • 2f7d141 Escape is the only "leave" key for every modal; q always quits b2view (removed the esc/q, h, p, enter, ? close dualities).
  • d0eb041 New --max CLI option to maximize the focused panel at startup (respects --panel).
  1. Misc
  • 724d3dc Skip subprocess use under Emscripten (Pyodide/WASM has none).
  • Tutorial 13.ctable-basics.ipynb updated for the new sort behavior.

sort_by and the zero-permutation sorted_slice window now work for
dictionary[str] and fixed blosc2.string columns:

- dictionary[str]: index by alphabetical rank (int32), reusing the
  numeric window path. _DictRankWrapper exposes only live rows so the
  index matches n_rows (else padding rejected the window read).
- fixed string: build the FULL index by computing segment min/max
  with a manual loop (numpy lacks the <U/<S ufunc loop), accept S/U
  in _supported_index_dtype, and add a numpy OOC merge fallback.
- staleness: rank index goes stale when the dictionary changes;
  detect via a stable SHA-1 hash of entries (hash() is seed-salted
  and would spuriously mismatch across processes), fall back to
  lexsort until rebuild_index.

Add tests/ctable/test_sort_by_strings.py (dict/string sort + window,
staleness, cross-process hash stability).
Press 'S' on a CTable data grid to sort by a FULL-indexed column via a
dropdown (R toggles reverse). The result is a zero-copy sort_by(view=True)
that streams from the index, so the table is never materialised; navigate
it normally, Esc restores original order. A SORTED chip shows in the status
bar (R reverses an active sort in place).

Model: set_sort/clear_sort/get_sort + full_index_columns, and a single
_ordered_object() read-precedence helper (window > filter > sort > base)
replacing five duplicated inline blocks. Sort and filter are mutually
exclusive; a row window composes over a sort.
Index reads (where() pruning, summary/min-max lookups) cache file-backed
sidecar handles in process-global dicts for query reuse.  These were only
dropped once the underlying files were deleted, so closing a table that
stays on disk kept its descriptors open — one file descriptor leaked per
table, exhausting the FD limit over long sessions (and large test runs).

Add evict_cached_index_handles(root), which pops (and thereby releases)
every _SIDECAR_HANDLE_CACHE / _DATA_CACHE / _HOT_CACHE / query / gather
entry whose scope path is at or under a table's resolved root.  Call it
from FileTableStorage and TreeStoreTableStorage close()/discard(); the
caches simply repopulate on the next query.

Fixes FD exhaustion when opening/closing many indexed tables; the full
test suite now passes at the default macOS ulimit -n 256.
@FrancescAlted FrancescAlted merged commit 4708e82 into main Jun 21, 2026
21 checks passed
@FrancescAlted FrancescAlted deleted the sort-by-views branch June 21, 2026 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant