Bloom filter overhaul#808
Draft
sleeepyjack wants to merge 76 commits intoNVIDIA:devfrom
Draft
Conversation
…bloom_filter_impl.
…t interface to the new APIs.
…mpilation of example is choking on #pragma unroll.
…try with early exit next.
Collaborator
Author
|
/ok to test 20be4e3 |
sleeepyjack
commented
May 8, 2026
| nvbench::int32_t PatternBits, | ||
| nvbench::int32_t HorizontalLayout, | ||
| nvbench::int32_t VerticalLayout> | ||
| void pfp_bloom_filter_add_impl(nvbench::state& state, |
Collaborator
Author
There was a problem hiding this comment.
We don't need the pfp_ prefix. Applies to all benchmarks
| /** | ||
| * @brief A benchmark evaluating `cuco::bloom_filter::contains_async` performance | ||
| * @brief Implementation of `cuco::bloom_filter::contains_async` with | ||
| * `parametric_filter_policy` |
Collaborator
Author
There was a problem hiding this comment.
no need to mention pfp here or in any other benchmark
| { | ||
| if constexpr (SaltIndex < SaltEndIndex) { | ||
| // Select top bit_index_width bits from salted hash to determine the bit index. | ||
| // if (threadIdx.x == 0) { printf("Salt Idx: %u\n", SaltIndex); } |
Comment on lines
+55
to
+57
| // uint64_t may be unsigned long, but atomicOr requires unsigned long long | ||
| using atomic_word_type = typename cuda::std:: | ||
| conditional_t<cuda::std::is_same_v<word_type, unsigned long>, unsigned long long, word_type>; |
Collaborator
Author
There was a problem hiding this comment.
I don't think this check is correct. The filter policy also allows for specifying different word types which would break here.
Comment on lines
+62
to
+71
| struct tuning { | ||
| static constexpr bool use_invoke_one = true; | ||
| static constexpr bool use_early_exit = false; | ||
| static constexpr bool use_cub_kernels = true; | ||
| static constexpr bool use_warp_cooperative_add_kernel = true; | ||
| static constexpr bool use_warp_cooperative_contains_kernel = true; | ||
| static constexpr bool use_work_stealing_add_kernel = false; | ||
| static constexpr bool use_work_stealing_contains_kernel = false; | ||
| static constexpr bool use_cuda_atomic_ref = false; | ||
| }; |
Collaborator
Author
There was a problem hiding this comment.
Would be great to not having to touch the source but we can defer this change for now.
Comment on lines
+366
to
+369
| // Device-side range add (cooperative). When the tile size matches `add_horizontal_layout`, | ||
| // each batch loads/hashes one key per lane in parallel, then the tile cooperatively processes | ||
| // them via `add_coop`. Otherwise the tile parallelizes across the key range with each lane | ||
| // scalar-inserting its own keys. |
Collaborator
Author
There was a problem hiding this comment.
I don't think the doxygen check will like this but we can fix it later
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TBD