Skip to content

chore: add count distinct group benchmarks#21575

Open
coderfender wants to merge 6 commits intoapache:mainfrom
coderfender:add_group_benchmarks_count_distinct
Open

chore: add count distinct group benchmarks#21575
coderfender wants to merge 6 commits intoapache:mainfrom
coderfender:add_group_benchmarks_count_distinct

Conversation

@coderfender
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Add benchmarks for group accumulators to test : #21561
The implementation forks out based on is_groups_accumulator_supported function call. Once this is merged , we should be able to evaluate group accumulators on count distinct expr

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the functions Changes to functions implementation label Apr 12, 2026
@coderfender coderfender force-pushed the add_group_benchmarks_count_distinct branch from 9413a74 to bacee66 Compare April 12, 2026 18:13
@coderfender coderfender force-pushed the add_group_benchmarks_count_distinct branch from 3abdd0d to 45a19b0 Compare April 12, 2026 18:42
@coderfender coderfender marked this pull request as ready for review April 12, 2026 18:51
@coderfender
Copy link
Copy Markdown
Contributor Author

coderfender commented Apr 12, 2026

@Dandandan , I plan to add benches to help better evaluate group accumulators along with direct clickbench / TPCH queries for implementing group accumulators . Please take a look whenever you get a chance

Comment on lines +250 to +256
if let Some(val) = arr.value(idx).into() {
let single_val =
Arc::new(Int64Array::from(vec![Some(val)])) as ArrayRef;
accumulators[*group_idx]
.update_batch(std::slice::from_ref(&single_val))
.unwrap();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let Some(val) = arr.value(idx).into() {
let single_val =
Arc::new(Int64Array::from(vec![Some(val)])) as ArrayRef;
accumulators[*group_idx]
.update_batch(std::slice::from_ref(&single_val))
.unwrap();
}
let single_val = values.slice(idx, 1);
accumulators[*group_idx]
.update_batch(std::slice::from_ref(&single_val))
.unwrap();

slightly simpler and would avoid the allocation of the single valued array for each row

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way would be to collect per-group indices first and then build the array:

let mut group_rows: Vec<Vec<i64>> = vec![Vec::new(); num_groups];
for (idx, &group_idx) in group_indices.iter().enumerate() {
    if arr.is_valid(idx) {
        group_rows[group_idx].push(arr.value(idx));
    }
}
for (group_idx, rows) in group_rows.iter().enumerate() {
    if !rows.is_empty() {
        let batch = Arc::new(Int64Array::from(rows.clone())) as ArrayRef;
        accumulators[group_idx].update_batch(std::slice::from_ref(&batch)).unwrap();
    }
}


let arr = values.as_any().downcast_ref::<Int64Array>().unwrap();
for (idx, group_idx) in group_indices.iter().enumerate() {
if let Some(val) = arr.value(idx).into() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need an Option here ?
arr.value(idx) returns i64 and calling .into() always return Some.
If you want to filter out the nulls then you need to use arr.is_null(idx)

for (name, num_groups, distinct_pct, group_type) in scenarios {
let n_distinct = BATCH_SIZE * distinct_pct / 100;
let values = Arc::new(create_i64_array(n_distinct)) as ArrayRef;
let group_indices = if group_type == "uniform" {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Introduce an enum instead of using strings:

enum GroupDist {
    Uniform, 
    Skewed 
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants