[fix](ann-index) Fix ANN IVF/PQ recall, avoid init-time large ANN build-buffer reservation, and skip ANN index build for segments with insufficient training rows.#64082
Open
kaka11chen wants to merge 7 commits into
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
1 similar comment
Contributor
Author
|
run buildall |
a66b5d7 to
582071f
Compare
Contributor
Author
|
run buildall |
Contributor
TPC-H: Total hot run time: 29269 ms |
Contributor
TPC-DS: Total hot run time: 169468 ms |
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
Author
|
run buildall |
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
### What problem does this PR solve? Issue Number: None Related PR: apache#64082 Problem Summary: Clarify why ANN index writer swaps the buffered vectors with an empty PODArray instead of using clear(). The swap intentionally releases the full-segment training buffer before saving the index, while clear() would keep the allocated capacity. ### Release note None ### Check List (For Author) - Test: No need to test (comment-only change) - Behavior changed: No - Does this need documentation: No
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
### What problem does this PR solve? Issue Number: None Related PR: apache#64082 Problem Summary: Remove the redundant ANN writer `_skip_build` state. The flag was only set from `close_on_error()`, while normal index skip behavior is already driven by zero rows or by the segment row count being smaller than the index training requirement. Keeping the writer state explicit avoids carrying an abort flag into regular add and finish paths. ### Release note None ### Check List (For Author) - Test: Unit Test - `ENABLE_PCH=OFF ./run-be-ut.sh --run --filter=AnnIndexWriterTest.*` - Behavior changed: No - Does this need documentation: No
Contributor
TPC-H: Total hot run time: 29312 ms |
Contributor
TPC-DS: Total hot run time: 169349 ms |
…added no-train indexes during segment writing. This made the build strategy harder to reason about and could still spend CPU/memory building small HNSW/FLAT segments that should be skipped by a Doris-side row threshold. This change removes the chunk add configs, buffers ANN vectors for the whole segment, applies effective_min_rows = max(vector_index->get_min_train_rows(), config::ann_index_build_min_segment_rows) in finish(), and then trains when needed, adds once, releases the build buffer, and saves the index. Empty segments or segments below the effective threshold delete only the current index entry instead of persisting an ANN index. Add BE config ann_index_build_min_segment_rows to skip persisting ANN indexes for small segments. Remove ann_index_build_add_chunk_size and ann_index_build_add_chunk_bytes.
Contributor
Author
|
run buildall |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary:
This PR fixes several ANN index build issues:
ann_index_build_chunk_size * dimfloats during init, which could allocate excessive memory immediately for high-dimensional vectors.nlistas its minimum FAISS training row requirement.This PR changes the build behavior as follows:
ann_index_build_min_segment_rowsso small ANN indexes can be skipped by a Doris-side row threshold.Release note
Fix ANN IVF/PQ recall, avoid init-time large ANN build-buffer reservation, and skip ANN index build for segments with insufficient training rows.
Check List (For Author)
Test
./run-regression-test.sh --run -d ann_index_p0 -s ivf_pq_full_buffer_train_recallrun buildallBehavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)