Bloom filter overhaul by sleeepyjack · Pull Request #808 · NVIDIA/cuCollections

sleeepyjack · 2026-04-30T01:25:33Z

TBD

…bloom_filter_impl.

…les.

…t interface to the new APIs.

…mpilation of example is choking on #pragma unroll.

…try with early exit next.

copy-pr-bot · 2026-04-30T01:25:37Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…ctions into bloom-filter-release

sleeepyjack · 2026-04-30T01:41:54Z

/ok to test 20be4e3

sleeepyjack

Self review

sleeepyjack · 2026-05-08T22:02:03Z

+          nvbench::int32_t PatternBits,
+          nvbench::int32_t HorizontalLayout,
+          nvbench::int32_t VerticalLayout>
+void pfp_bloom_filter_add_impl(nvbench::state& state,


We don't need the pfp_ prefix. Applies to all benchmarks

sleeepyjack · 2026-05-08T22:04:02Z

 /**
- * @brief A benchmark evaluating `cuco::bloom_filter::contains_async` performance
+ * @brief Implementation of `cuco::bloom_filter::contains_async` with
+ * `parametric_filter_policy`


no need to mention pfp here or in any other benchmark

sleeepyjack · 2026-05-08T22:08:51Z

+  {
+    if constexpr (SaltIndex < SaltEndIndex) {
+      // Select top bit_index_width bits from salted hash to determine the bit index.
+      // if (threadIdx.x == 0) { printf("Salt Idx: %u\n", SaltIndex); }


Remove this

sleeepyjack · 2026-05-08T22:14:42Z

+  // uint64_t may be unsigned long, but atomicOr requires unsigned long long
+  using atomic_word_type = typename cuda::std::
+    conditional_t<cuda::std::is_same_v<word_type, unsigned long>, unsigned long long, word_type>;


I don't think this check is correct. The filter policy also allows for specifying different word types which would break here.

sleeepyjack · 2026-05-08T22:15:33Z

+  struct tuning {
+    static constexpr bool use_invoke_one                       = true;
+    static constexpr bool use_early_exit                       = false;
+    static constexpr bool use_cub_kernels                      = true;
+    static constexpr bool use_warp_cooperative_add_kernel      = true;
+    static constexpr bool use_warp_cooperative_contains_kernel = true;
+    static constexpr bool use_work_stealing_add_kernel         = false;
+    static constexpr bool use_work_stealing_contains_kernel    = false;
+    static constexpr bool use_cuda_atomic_ref                  = false;
+  };


Would be great to not having to touch the source but we can defer this change for now.

sleeepyjack · 2026-05-08T22:18:27Z

+  // Device-side range add (cooperative). When the tile size matches `add_horizontal_layout`,
+  // each batch loads/hashes one key per lane in parallel, then the tile cooperatively processes
+  // them via `add_coop`. Otherwise the tile parallelizes across the key range with each lane
+  // scalar-inserting its own keys.


I don't think the doxygen check will like this but we can fix it later

sleeepyjack and others added 30 commits September 9, 2025 08:53

Add support for horizontal/verstical vectorization parameter

d67ae07

Restructure policies

cb7a78d

Fix indexing bug

21ff88c

Coalesced output write

41e217a

Add unit test for adaptive contains kernel

2b8ecde

Add parametric filter policy (dummy)

c322825

Merge remote-tracking branch 'upstream' into exp-filter-policy

17f9c19

Multiplicative hashing implemented in policy. Some changes needed to …

97693a3

…bloom_filter_impl.

Finalized proposed policy interface.

4285b39

Fixed a mistake in thread_dispatch(). Removed some dead static variab…

0934cba

…les.

Multiplicative hashing calling code infrastructure.

cf43c8f

New example script for sanity checking. Still need to connect the hos…

52d7f17

…t interface to the new APIs.

host and device APIs are connected for multiplicative hashing, but co…

91714b0

…mpilation of example is choking on #pragma unroll.

Debugging done. End-to-end filters working properly.

fa7d9a7

Tests updated.

13918c2

Good performance agains arrow FP when early exit is turned off. Will …

7008773

…try with early exit next.

Updated bloom filter nvbench script.

9252f69

Changing exp kernels from if to while for grid-striding.

12b4847

Bug fix in filter size in PFP_EVALUATION_EXAMPLE

d3fcce2

Bug fix in while loop in exp kernels.

a30d1b1

Small PR review fixes.

18e9c34

group-cooperative parametric filter policy code paths implemented.

a018896

Benchmark scripts updated.

8020b72

Notebook with theoretical FPR calculators.

b655183

Remove static checks on hash result type that are blocking NVBench.

e558f1c

Enum type lists for the add benchmark added.

c83912c

Added salt generation script. Updated the total number of salts to 64.

46cf45f

Updated block index selection in PFP to match Arrow policy.

892e4a9

Merge remote-tracking branch 'upstream' into exp-filter-policy

e9f8ac9

Enable magic modulo

1a4b5e0

sleeepyjack and others added 17 commits December 1, 2025 09:16

Merge remote-tracking branch 'upstream' into consolidate

c676524

cache_sectorized implemented in parametric_filter_policy.

65c727b

Started on bloom_filter_imp

555113f

Add implemented for cache-sectorized.

25bf1cc

Fix bug in set_bits routine for cache-sectorized.

aa5b117

contains implemented for cache-sectorized.

2cce8c2

contains has bug when horizontal_layout > 1.

00d5bde

Cache-sectorized working.

5643887

Turned off use_cub_kernels and work stealing for clearer evaluation.

4d39069

Add CSBF benchmarks

355b646

Merge remote-tracking branch 'upstream/dev' into consolidate

94bce41

Merge remote-tracking branch 'upstream/dev' into bloom-filter-release

617cb3f

Drop baggage

b805999

WIP but working

519970f

Tuning struct

6d299e0

Docs

076a5cf

Address review comments

96e2a5a

sleeepyjack added helps: rapids Helps or needed by RAPIDS In Progress Currently a work in progress type: improvement Improvement / enhancement to an existing function topic: bloom_filter Issues related to bloom_filter labels Apr 30, 2026

[pre-commit.ci] auto code formatting

a999f46

NVIDIA deleted a comment from copy-pr-bot Bot Apr 30, 2026

sleeepyjack added 4 commits April 30, 2026 03:27

Merge branch 'dev' into bloom-filter-release

c2dc7ab

Update copyright year

b4c4aa5

Merge branch 'bloom-filter-release' of github.com:sleeepyjack/cuColle…

2a8de62

…ctions into bloom-filter-release

Address Doxygen

20be4e3

sleeepyjack commented May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bloom filter overhaul#808

Bloom filter overhaul#808
sleeepyjack wants to merge 76 commits intoNVIDIA:devfrom
sleeepyjack:bloom-filter-release

sleeepyjack commented Apr 30, 2026

Uh oh!

copy-pr-bot Bot commented Apr 30, 2026

Uh oh!

sleeepyjack commented Apr 30, 2026

Uh oh!

sleeepyjack left a comment

Uh oh!

sleeepyjack May 8, 2026

Uh oh!

sleeepyjack May 8, 2026

Uh oh!

sleeepyjack May 8, 2026

Uh oh!

sleeepyjack May 8, 2026

Uh oh!

sleeepyjack May 8, 2026

Uh oh!

sleeepyjack May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sleeepyjack commented Apr 30, 2026

Uh oh!

copy-pr-bot Bot commented Apr 30, 2026

Uh oh!

sleeepyjack commented Apr 30, 2026

Uh oh!

sleeepyjack left a comment

Choose a reason for hiding this comment

Uh oh!

sleeepyjack May 8, 2026

Choose a reason for hiding this comment

Uh oh!

sleeepyjack May 8, 2026

Choose a reason for hiding this comment

Uh oh!

sleeepyjack May 8, 2026

Choose a reason for hiding this comment

Uh oh!

sleeepyjack May 8, 2026

Choose a reason for hiding this comment

Uh oh!

sleeepyjack May 8, 2026

Choose a reason for hiding this comment

Uh oh!

sleeepyjack May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants