Problem
HistogramMeta::number_of_distinct_value currently looks like NDV, but it is assigned from values_len during analyze. In practice it is used as a minimum selectivity denominator in histogram range estimation, not as true number-of-distinct-values metadata.
This naming can be misleading and may cause future optimizer rules to treat it as real NDV.
Possible Fix
- Rename the current field to reflect its actual meaning, such as
values_len, non_null_count, or selectivity_denominator.
- Add separate real NDV metadata later, likely from analyze using HLL or an exact-small/fuzzy-large strategy.
- Use real NDV for optimizer estimates where appropriate:
- equality selectivity fallback:
rows / ndv
- group by / distinct cardinality
- join cardinality
- distinct/group cost
- index scan equality-prefix fallback cost
Notes
CMS should still be preferred for concrete value frequency when available. Histogram should still drive range estimates. NDV should fill the broader cardinality/selectivity gaps.
Problem
HistogramMeta::number_of_distinct_valuecurrently looks like NDV, but it is assigned fromvalues_lenduring analyze. In practice it is used as a minimum selectivity denominator in histogram range estimation, not as true number-of-distinct-values metadata.This naming can be misleading and may cause future optimizer rules to treat it as real NDV.
Possible Fix
values_len,non_null_count, orselectivity_denominator.rows / ndvNotes
CMS should still be preferred for concrete value frequency when available. Histogram should still drive range estimates. NDV should fill the broader cardinality/selectivity gaps.