Skip to content

Store IndexName tags as a sorted symbol dictionary#195

Merged
mtfishman merged 4 commits into
mainfrom
mf/sorted-dict-tags
Jun 29, 2026
Merged

Store IndexName tags as a sorted symbol dictionary#195
mtfishman merged 4 commits into
mainfrom
mf/sorted-dict-tags

Conversation

@mtfishman

Copy link
Copy Markdown
Member

Summary

Stores IndexName tags in a sorted, symbol-keyed dictionary instead of a Dict{String, String}, which makes name comparison and hashing substantially faster while keeping the tag API string-based.

The tag store is now an internal SortedDict{Symbol, Symbol}, a small AbstractDict backed by two parallel sorted vectors with linear-scan lookup, suited to the handful of tags an index carries. Symbol keys and values compare and hash faster than strings, and the sorted backing makes equality, hashing, and ordering plain vector operations with no per-call allocation. IndexName comparison now also short-circuits in id, then plev, then tags, so the common case of distinct ids never touches the tags, and comparing an index against a primed or retagged copy of itself rejects on the prime level instead of walking the full tag set.

The public tag API stays string-valued: gettag returns a String, tags returns an AbstractDict from strings to strings, and settag/hastag and the IndexName constructor accept either strings or symbols. An internal tags_stored exposes the raw symbol dictionary for the comparison, hashing, and display paths.

mtfishman and others added 2 commits June 29, 2026 15:19
A small `AbstractDict` backed by two parallel sorted vectors with
linear-scan lookup, suited to the small key counts of index-name tags.
Equality and hashing are structural over the sorted vectors, so they
are cheap and order-independent by construction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Back IndexName tags with the internal SortedDict{Symbol,Symbol} instead
of a Dict{String,String}. Symbol keys and values compare and hash faster
than strings, and the sorted backing makes equality, hashing, and
ordering plain vector operations with no per-call allocation.

IndexName comparison now short-circuits in id, then plev, then tags, so
distinct ids never touch the tags and comparing an index against a primed
or retagged copy of itself rejects on the prime level instead of walking
the full tag set.

The public tag API stays string-valued: gettag returns a String, tags
returns an AbstractDict from strings to strings, and settag/hastag and the
IndexName constructor accept either strings or symbols. An internal
tags_stored exposes the raw symbol dictionary for the comparison, hashing,
and display paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.64706% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.65%. Comparing base (5444f64) to head (06454a5).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/index.jl 97.29% 1 Missing ⚠️
src/sorteddict.jl 97.91% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #195      +/-   ##
==========================================
+ Coverage   72.67%   73.65%   +0.97%     
==========================================
  Files          28       29       +1     
  Lines        1486     1545      +59     
==========================================
+ Hits         1080     1138      +58     
- Misses        406      407       +1     
Flag Coverage Δ
docs 24.26% <30.37%> (+0.33%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mtfishman mtfishman force-pushed the mf/sorted-dict-tags branch 2 times, most recently from 33802aa to aa8bead Compare June 29, 2026 19:39
settags no longer normalizes its argument: the IndexName tags field is
typed, so `@set` enforces the type on construction, and its only callers
(settag/unsettag) already build a SortedDict{Symbol,Symbol}. Split the
conversion into to_symbol_pair (convert one pair's key and value to
symbols) and to_tags, which accepts either a single bare Pair or a
collection of pairs by dispatch, mirroring how Dict's constructor handles
the same two shapes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mtfishman mtfishman force-pushed the mf/sorted-dict-tags branch from aa8bead to 0f7c7ce Compare June 29, 2026 19:42
Replace the custom linear _searchsortedfirst, a misleading name for a linear
scan, with findfirst, and drop the dead key/value converts. The keys stay
sorted on insert, so lookups could later call searchsortedfirst for larger
collections.

Credit TensorKit.jl's SortedVectorDict as the model and note DataStructures.jl's
SortedDict and Dictionaries.jl's ArrayDictionary as comparable designs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mtfishman mtfishman force-pushed the mf/sorted-dict-tags branch from 45502b5 to 06454a5 Compare June 29, 2026 19:59
@mtfishman mtfishman enabled auto-merge (squash) June 29, 2026 20:03
@mtfishman mtfishman merged commit 49caa2b into main Jun 29, 2026
18 checks passed
@mtfishman mtfishman deleted the mf/sorted-dict-tags branch June 29, 2026 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant