Skip to content

Tracking Issue: random access Lance parity #7915

@myrrc

Description

@myrrc

Lance is better on feature vectors by around 3-15x.

  1. No selection usage in repeated reads, solved by Use selection in repeated scans #8137 (2x improvement)
  2. Most requests produce small splits where BTree overhead dominates Use Vec<u64> instead of BTreeSet for splits #8194 (1.3x improvement)
  3. Re-parsing flatbuffers and reinitializing chunk offsets is an overhead ViewedLayoutChildren child layout cache #8234 (likely 2x improvement
  4. Chunked layout children are not cached Chunk reader children cache #8209
  5. Intermediate materialized structs deallocation.
  6. io_uring et al.

Marginal improvements:

  • For small split read tasks the time of reading is marginal compared to tokio task planning and LazyScanStream initialization. This may be solved by a heuristic - only for the main thread.

Metadata

Metadata

Assignees

Labels

tracking-issueShared implementation context for work likely to span multiple PRs.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions