perf(scanner): answer unfiltered count_rows from fragment metadata by LuciferYang · Pull Request #7076 · lance-format/lance

LuciferYang · 2026-06-03T12:55:37Z

Summary

Scanner::count_rows always built and executed a count plan, even when the count could be satisfied from fragment metadata alone. For a plain count with no row-level filter or search this scanned row data unnecessarily — especially wasteful when the scanner is restricted to a subset of fragments via with_fragments (#6970).

This adds a metadata-only fast path to Scanner::count_rows: when nothing in the scan needs to inspect row data, it sums each fragment's live row count (physical rows − deletions, both tracked in fragment metadata) instead of building and executing a plan. Dataset::count_rows(None) already had such a fast path via count_all_rows; this brings the same benefit to the Scanner path (and, crucially, to fragment-restricted counts).

The fast path falls back to the existing count plan — preserving its results and its errors — whenever any of these is set:

a row-level filter, vector / full-text search, index_segments, fast_search, or include_deleted_rows;
order_by or limit / offset (the plan rejects these when combined with the count aggregate);
a dynamic-only projection such as SELECT 1 (also rejected by the plan).

The shared per-fragment summing is factored into Dataset::count_fragment_rows, which count_all_rows now also uses. Its fan-out uses the module-standard io_parallelism() bound, matching the sibling count_deleted_rows. In the common (new-format) case the count is answered entirely from cached metadata with zero I/O; only legacy/uncached fragments fall back to per-fragment metadata reads.

Closes #6970.

Test plan

New test_count_rows_metadata_only covering: whole dataset, fragment subset, empty fragment list, deletions (whole-dataset and subset), include_deleted_rows fallback, limit / offset / order_by error preservation, filtered-count fallback, and all-rows-deleted. It asserts zero read/write IOPS on the metadata fast path and > 0 reads on the plan fallback.
cargo test -p lance --lib — count, scanner (142), and dataset_io (52) suites pass.
cargo fmt --all --check clean.
cargo clippy -p lance --all-targets -- -D warnings clean.

`Scanner::count_rows` always built and executed a count plan, even when the count could be satisfied from fragment metadata alone. For a plain count with no row-level filter or search this scanned row data unnecessarily, which is especially wasteful when the scanner is restricted to a subset of fragments via `with_fragments`. Add a fast path that sums each fragment's live row count (physical rows minus deletions, both tracked in metadata) when nothing in the scan needs to inspect row data. The path falls back to the existing plan — preserving its results and errors — whenever a filter, vector/full-text search, index_segments, fast_search, include_deleted_rows, ordering, limit/offset, or a dynamic-only projection (e.g. `SELECT 1`) is set. The shared per-fragment summing is factored into `Dataset::count_fragment_rows`, which `count_all_rows` now also uses; its fan-out matches the module's standard `io_parallelism()` bound. Closes lance-format#6970.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

codecov · 2026-06-03T13:32:27Z

Codecov Report

❌ Patch coverage is 97.80220% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance/src/dataset/scanner.rs	97.61%	0 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

claude Bot reviewed Jun 3, 2026

View reviewed changes

github-actions Bot added the performance label Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(scanner): answer unfiltered count_rows from fragment metadata#7076

perf(scanner): answer unfiltered count_rows from fragment metadata#7076
LuciferYang wants to merge 1 commit into
lance-format:mainfrom
LuciferYang:perf/6970-count-rows-metadata-only

LuciferYang commented Jun 3, 2026

Uh oh!

claude Bot left a comment

Uh oh!

codecov Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LuciferYang commented Jun 3, 2026

Summary

Test plan

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

codecov Bot commented Jun 3, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant