Skip to content

Bloom filter overhaul#808

Draft
sleeepyjack wants to merge 76 commits intoNVIDIA:devfrom
sleeepyjack:bloom-filter-release
Draft

Bloom filter overhaul#808
sleeepyjack wants to merge 76 commits intoNVIDIA:devfrom
sleeepyjack:bloom-filter-release

Conversation

@sleeepyjack
Copy link
Copy Markdown
Collaborator

TBD

sleeepyjack and others added 30 commits September 9, 2025 08:53
…mpilation of example is choking on #pragma unroll.
@sleeepyjack sleeepyjack added helps: rapids Helps or needed by RAPIDS In Progress Currently a work in progress type: improvement Improvement / enhancement to an existing function topic: bloom_filter Issues related to bloom_filter labels Apr 30, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 30, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@NVIDIA NVIDIA deleted a comment from copy-pr-bot Bot Apr 30, 2026
@sleeepyjack
Copy link
Copy Markdown
Collaborator Author

/ok to test 20be4e3

Copy link
Copy Markdown
Collaborator Author

@sleeepyjack sleeepyjack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self review

nvbench::int32_t PatternBits,
nvbench::int32_t HorizontalLayout,
nvbench::int32_t VerticalLayout>
void pfp_bloom_filter_add_impl(nvbench::state& state,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the pfp_ prefix. Applies to all benchmarks

/**
* @brief A benchmark evaluating `cuco::bloom_filter::contains_async` performance
* @brief Implementation of `cuco::bloom_filter::contains_async` with
* `parametric_filter_policy`
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to mention pfp here or in any other benchmark

{
if constexpr (SaltIndex < SaltEndIndex) {
// Select top bit_index_width bits from salted hash to determine the bit index.
// if (threadIdx.x == 0) { printf("Salt Idx: %u\n", SaltIndex); }
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

Comment on lines +55 to +57
// uint64_t may be unsigned long, but atomicOr requires unsigned long long
using atomic_word_type = typename cuda::std::
conditional_t<cuda::std::is_same_v<word_type, unsigned long>, unsigned long long, word_type>;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this check is correct. The filter policy also allows for specifying different word types which would break here.

Comment on lines +62 to +71
struct tuning {
static constexpr bool use_invoke_one = true;
static constexpr bool use_early_exit = false;
static constexpr bool use_cub_kernels = true;
static constexpr bool use_warp_cooperative_add_kernel = true;
static constexpr bool use_warp_cooperative_contains_kernel = true;
static constexpr bool use_work_stealing_add_kernel = false;
static constexpr bool use_work_stealing_contains_kernel = false;
static constexpr bool use_cuda_atomic_ref = false;
};
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to not having to touch the source but we can defer this change for now.

Comment on lines +366 to +369
// Device-side range add (cooperative). When the tile size matches `add_horizontal_layout`,
// each batch loads/hashes one key per lane in parallel, then the tile cooperatively processes
// them via `add_coop`. Otherwise the tile parallelizes across the key range with each lane
// scalar-inserting its own keys.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the doxygen check will like this but we can fix it later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

helps: rapids Helps or needed by RAPIDS In Progress Currently a work in progress topic: bloom_filter Issues related to bloom_filter type: improvement Improvement / enhancement to an existing function

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants