Skip to content

zvi-code/iterators-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

iterators_rs

Multi-dimensional iterators over numeric ranges with configurable access patterns.

Features

  • Distribution-based iteration: Sequential, uniform random, Zipfian, and more
  • Distribution trait: Standardized sampling interface with struct-based implementations
  • Multi-dimensional: Iterate over Cartesian products with per-dimension distributions
  • Group splitting: Split ranges into overlapping groups for concurrent workloads
  • Bitmap filtering: Filter iteration based on atomic bitmap state
  • Constant memory: O(dimensions) memory regardless of range sizes
  • Thread-safe: Lock-free atomic bitmap for concurrent access
  • Zero-allocation iteration: SmallVec-based coordinates for ≤4 dimensions
  • Optimized strides: Const-generic stride computation for common dimension counts

Dependencies

This crate uses the following external dependencies for optimized performance:

Dependency Version Purpose
fastrand 2.0 High-performance PRNG for stochastic distributions
smallvec 1.11 Stack-allocated vectors for coordinate storage (≤4 dims)
num-integer 0.1 Optimized GCD computation for coprime multipliers

Quick Start

use iterators_rs::{MultiDimIterator, Dist, Coords};

// Create a 2D iterator with sequential access on both dimensions
let iter = MultiDimIterator::new(vec![
    (0..10, Dist::Sequential),
    (0..5, Dist::Sequential),
]);

// Iterate over all 50 combinations
// Coords is SmallVec<[u64; 4]> - stack-allocated for ≤4 dimensions
for coords in iter {
    println!("({}, {})", coords[0], coords[1]);
}

Distribution Types

Using the Dist Enum

use iterators_rs::{Dist, DistState};

// Check distribution ordering
assert!(Dist::Sequential.is_ordered());
assert!(!Dist::Uniform.is_ordered());

// Use DistState for sampling
let mut state = DistState::new(Dist::Sequential, 10, 12345);
assert_eq!(state.sample(), 0);
assert_eq!(state.sample(), 1);

// Use DistState for random sampling
let mut uniform = DistState::new(Dist::Uniform, 1000, 12345);
let value = uniform.sample();
assert!(value < 1000);

Using the Distribution Trait

The Distribution trait provides a standardized interface for sampling:

use iterators_rs::{Distribution, SequentialDist, PermuteDist};
use fastrand::Rng;

// Use struct-based distributions with the Distribution trait
let mut seq = SequentialDist::new(10);
let mut rng = Rng::with_seed(42);

assert_eq!(seq.sample(&mut rng), 0);
assert_eq!(seq.sample(&mut rng), 1);
assert!(seq.is_ordered());

// Compose distributions using wrappers
let mut permuted = PermuteDist::with_coprime(SequentialDist::new(100));
let sample = permuted.sample(&mut rng);
assert!(sample < 100);

Available distribution structs:

  • SequentialDist - Sequential: 0, 1, 2, ..., N-1 (ordered)
  • SemiSequentialDist - Permuted sequential (ordered)
  • UniformDist - Uniform random (unordered)
  • ZipfianDist - Zipfian distribution (unordered)
  • LatestDist - Latest-N bias (unordered)
  • ExponentialDist - Exponential decay (unordered)
  • NormalDist - Normal/Gaussian (unordered)
  • HotspotDist - Hotspot distribution (unordered)
  • PermuteDist<D> - Permutation wrapper
  • ReverseDist<D> - Reverse wrapper

Coords Type

The Coords type is a SmallVec<[u64; 4]> that provides stack allocation for common 1-4 dimension cases:

use iterators_rs::{MultiDimIterator, Dist, Coords};

let iter = MultiDimIterator::new(vec![
    (0..10, Dist::Sequential),
    (0..5, Dist::Sequential),
]);

// Coords is stack-allocated for ≤4 dimensions
for coords in iter.take(3) {
    // Access coordinates like a slice
    println!("({}, {})", coords[0], coords[1]);
}

Group Splitting

Split iteration ranges into overlapping groups for concurrent workloads:

use iterators_rs::{MultiDimIterator, Dist};

let iter = MultiDimIterator::new(vec![
    (0..100, Dist::Sequential),
    (0..50, Dist::Sequential),
]);

// Split into 4 groups with 25% overlap on dim0, no overlap on dim1
let groups = iter.split_into_groups(4, vec![0.25, 0.0]);
assert_eq!(groups.len(), 4);

// Each group is an independent iterator
for (i, group) in groups.into_iter().enumerate() {
    println!("Group {} has {} dimensions", i, group.num_dimensions());
}

When splitting an iterator with a bitmap filter attached, the filter configuration is preserved for each group. All groups share the same underlying bitmap via Arc, enabling concurrent filtered iteration.

Bitmap Filtering

Filter iteration based on atomic bitmap state:

use iterators_rs::{MultiDimIterator, Dist, FilterCondition, AtomicBitmap};
use std::sync::Arc;

let bitmap = Arc::new(AtomicBitmap::with_capacity(100));
bitmap.set(42);
bitmap.set(67);

// Create iterator with bitmap filter - only return coords where bit is set
let iter = MultiDimIterator::new(vec![(0..100, Dist::Sequential)])
    .with_bitmap_filter(bitmap, FilterCondition::Set, 1);

// Will only yield coordinates 42 and 67
let results: Vec<_> = iter.collect();
assert_eq!(results.len(), 2);

Atomic Claiming

Atomically claim unique entries across multiple threads:

use iterators_rs::{MultiDimIterator, Dist, FilterCondition, UpdateAction, AtomicBitmap};
use std::sync::Arc;
use std::thread;

let bitmap = Arc::new(AtomicBitmap::with_capacity(1000));

// Create iterator with bitmap filter attached BEFORE splitting
// The split_into_groups method preserves the bitmap filter for each group
let iter = MultiDimIterator::new(vec![(0..1000, Dist::Uniform)])
    .with_limit(100)
    .with_bitmap_filter(bitmap.clone(), FilterCondition::Unset, 1);

// Split into 4 groups - each group inherits the bitmap filter configuration
let groups = iter.split_into_groups(4, vec![0.0]);

// Each thread receives a pre-configured iterator with bitmap filter
let handles: Vec<_> = groups.into_iter().map(|mut group| {
    thread::spawn(move || {
        let mut claimed = 0u64;
        while let Some(_coords) = group.next_and_update(UpdateAction::Set) {
            claimed += 1;
        }
        claimed
    })
}).collect();

let total: u64 = handles.into_iter().map(|h| h.join().unwrap()).sum();
// Each claimed entry is unique across all threads

The split_into_groups method preserves bitmap filters, so you can configure the iterator once and split it for concurrent use. All groups share the same underlying bitmap via Arc, enabling thread-safe atomic operations.

Multi-Bitmap Filtering

Filter iteration based on multiple bitmap conditions using AND logic. This is useful for scenarios like "delete entries that exist AND are NOT protected":

use iterators_rs::{MultiDimIterator, Dist, FilterCondition, UpdateAction, AtomicBitmap};
use std::sync::Arc;

let exists_bitmap = Arc::new(AtomicBitmap::with_capacity(1000));
let protected_bitmap = Arc::new(AtomicBitmap::with_capacity(1000));

// Initialize: some entries exist, some are protected
for i in 0..500 {
    exists_bitmap.set(i);
}
for i in [10, 20, 30, 100, 200] {
    protected_bitmap.set(i);
}

// Create iterator with composite filter:
// - Primary: exists_bitmap is SET (entry exists) - can be updated
// - Secondary: protected_bitmap is UNSET (entry is NOT protected) - read-only
let mut iter = MultiDimIterator::new(vec![(0..1000, Dist::Sequential)])
    .with_bitmap_filter(exists_bitmap.clone(), FilterCondition::Set, 1)
    .with_secondary_filter(protected_bitmap.clone(), FilterCondition::Unset);

// Atomically delete non-protected entries
let mut deleted = 0u64;
while let Some(coords) = iter.next_and_update(UpdateAction::Clear) {
    deleted += 1;
    // Entry is now cleared from exists_bitmap
}

// Protected entries (10, 20, 30, 100, 200) remain in exists_bitmap
assert!(exists_bitmap.test(10));
assert!(exists_bitmap.test(20));

Key points:

  • The first bitmap (primary) can be updated via next_and_update()
  • Secondary filters are read-only and provide additional filtering
  • All conditions must match (AND logic)
  • Chain multiple .with_secondary_filter() calls for complex conditions

Iterator Traits

The MultiDimIterator implements standard Rust iterator traits:

  • Iterator - Core iteration
  • FusedIterator - Guarantees None after exhaustion
  • ExactSizeIterator - For fully-ordered iterators without bitmap filter
use iterators_rs::{MultiDimIterator, Dist};

let iter = MultiDimIterator::new(vec![
    (0..10, Dist::Sequential),
    (0..5, Dist::Sequential),
]);

// ExactSizeIterator: know exact remaining count
assert_eq!(iter.len(), 50);

// Standard adapters work as expected
let first_10: Vec<_> = iter.take(10).collect();
assert_eq!(first_10.len(), 10);

Distribution Selection Guide

Choose distributions based on your workload characteristics:

Distribution Use Case Parameters
Sequential Bulk load, scans, ordered iteration -
SemiSequential Deterministic scatter (reproducible) multiplier
Uniform Random access, uniform load -
Zipfian Web cache (~80/20 rule), hot keys skew: 0.99 typical
Hotspot Explicit hot/cold split hot_pct, hot_prob
Latest Append-heavy (logs, streams) recent_pct, recent_prob
Exponential Time-decay (sessions, TTL) lambda
Normal Range queries, centered access mean_pct, std_pct

Overlap Calculation

When splitting ranges into groups, overlap controls how much adjacent groups share:

No overlap (O=0.0) - Disjoint partitions:

Range 0..100, 4 groups, overlap=0.0
  Group 0: 0..25
  Group 1: 25..50
  Group 2: 50..75
  Group 3: 75..100

Partial overlap (O=0.25) - 25% shared between adjacent groups:

Range 0..100, 4 groups, overlap=0.25
  Group 0: 0..31   (31% of range)
  Group 1: 23..54  (25% overlap with group 0)
  Group 2: 46..77  (25% overlap with group 1)
  Group 3: 69..100 (25% overlap with group 2, wraps to group 0)

Full overlap (O=1.0) - All groups see entire range (for contention testing).

Data Model

The iterator supports hierarchical key spaces common in databases:

group_i = {prefix_i:k | k ∈ range_i}           # Keys by prefix
group_ik = {prefix_i:k:j | j ∈ range_ik}      # Fields within key k

Example:
  user:0 → {field:0, field:1, field:2}   # user:0 has 3 fields
  user:1 → {field:0, field:1}            # user:1 has 2 fields
  hash:0 → {name:0, name:1, name:2}

Model this with multi-dimensional iteration:

// 1000 keys × 100 fields each
let iter = MultiDimIterator::new(vec![
    (0..1000, Dist::Zipfian { skew: 0.99 }),  // key dimension (hot keys)
    (0..100, Dist::Uniform),                   // field dimension
]);

Examples

Run the example workloads:

# Database workload patterns (KV, hash, vector, time-series, graph)
cargo run --example workload_demo --release

# Advanced simulations (hot/cold tiers, GT protection, multi-threaded)
cargo run --example benchmark_simulations --release

workload_demo.rs

Demonstrates common database benchmark patterns:

Scenario Description Key Features
1. Simple KV Prefill → overwrite → get Sequential + Zipfian
2. Hash Keys 3M keys × 100 fields 2D iteration
3. Multi-threaded Disjoint prefill, overlapping queries split_into_groups
4. Delete Cycle Prefill → delete → rewrite → query Bitmap filtering
5. Vector Search 1M vectors × 128 dims Batch insert, hotspot
6. Time-Series Append-heavy with range queries Latest + Exponential
7. YCSB Workloads Standard A-F patterns Various distributions
8. MT Vector Search Concurrent insert/search/delete Atomic claiming
9. Graph DB Vertices + edges, traversals Power-law (Zipfian 1.2)
10. Benchmark Suite Bulk → warmup → steady → spike → cleanup Mixed patterns

benchmark_simulations.rs

Advanced scenarios with bitmap state management:

Simulation Description Key Features
1. Hot/Cold Tiers 1%/9%/90% tier distribution Hotspot distribution
2. Vector GT Protection 50K ground-truth vectors never deleted with_secondary_filter
3. MT GT Protection Multi-threaded with protected entries Split preserves filters
4. Continuous Cycles Load/delete/query steady-state Atomic next_and_update
5. Tiered Storage L0/L1/L2 with promotion tracking Multiple bitmaps

Benchmarks

Run benchmarks with:

cargo bench

Benchmarks cover:

  • PRNG performance (fastrand)
  • SmallVec vs Vec allocation
  • Distribution sampling
  • Nested distribution performance
  • Bitmap-filtered iteration

License

See LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages