Fix pandas3 compat: use pd.Series(...).value_counts() instead of pd.value_counts(...) by cakedev0 · Pull Request #213 · IntelPython/scikit-learn_bench

cakedev0 · 2026-05-26T08:26:40Z

Description

pandas 3 stopped exposing pd.value_counts so a couple of places were breaking. I changed to using pd.Series(...).value_counts() instead.

Checklist:

Completeness and readability

Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.

Signed-off-by: Arthur Lacote <arthur.lacote@probabl.ai>

cakedev0 · 2026-05-26T16:19:44Z

I'm not sure why the CI is red but I think it's unrelated to my changes, so I'm marking this PR as ready for review.

david-cortes-intel

LGTM. CI errors are unrelated to this change.

But note that there's still other compatibility issues between the latest openml package and pandas>=3 which lead to errors:

  File "/localdisk2/mkl/dcortes/repos/scikit-learn_bench/sklbench/datasets/downloaders.py", line 99, in fetch_and_correct_openml
    x, y, _, _ = dataset.get_data(
                 ^^^^^^^^^^^^^^^^^
  File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/openml/datasets/dataset.py", line 820, in get_data
    data, categorical, attribute_names = self._load_data()
                                         ^^^^^^^^^^^^^^^^^
  File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/openml/datasets/dataset.py", line 611, in _load_data
    return self._cache_compressed_file_from_file(Path(file_to_load))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/openml/datasets/dataset.py", line 556, in _cache_compressed_file_from_file
    attribute_names, categorical, data = self._parse_data_from_file(data_file)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/openml/datasets/dataset.py", line 584, in _parse_data_from_file
    data, categorical, attribute_names = self._parse_data_from_arff(data_file)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/openml/datasets/dataset.py", line 472, in _parse_data_from_arff
    pd.factorize(type_)[0]
    ^^^^^^^^^^^^^^^^^^^
  File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/pandas/core/algorithms.py", line 791, in factorize
    values = _ensure_arraylike(values, func_name="factorize")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/pandas/core/algorithms.py", line 239, in _ensure_arraylike
    raise TypeError(
TypeError: factorize requires a Series, Index, ExtensionArray, np.ndarray or NumpyExtensionArray got list.

(CC @avolkov-intel )

cakedev0 · 2026-05-27T14:03:32Z

there's still other compatibility issues between the latest openml package and pandas>=3 which lead to errors

I don't see those. Do you have a config that reproduces it?

Note: I'm using pixi and I'm installing openml from pypi.

david-cortes-intel · 2026-05-27T14:05:57Z

there's still other compatibility issues between the latest openml package and pandas>=3 which lead to errors

I don't see those. Do you have a config that reproduces it?

Note: I'm using pixi and I'm installing openml from pypi.

It could be triggered like this, if you are interested in contributing to OpenML:

python -m sklbench --config configs/regular/ensemble.json --result-file try.json --filters algorithm:library=sklearn algorithm:device=cpu --prefetch

The openml package in conda-forge requires numpy<2, so right now PyPI is the only option.

cakedev0 · 2026-05-27T14:27:32Z

Thanks, I will use pandas 2 then.

Fix: use pd.Series(...).value_counts instead of pd.value_counts(...)

5d4ced5

Signed-off-by: Arthur Lacote <arthur.lacote@probabl.ai>

cakedev0 marked this pull request as ready for review May 26, 2026 16:20

cakedev0 requested a review from david-cortes-intel as a code owner May 26, 2026 16:20

david-cortes-intel approved these changes May 27, 2026

View reviewed changes

david-cortes-intel merged commit eb5333a into IntelPython:main May 27, 2026
4 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pandas3 compat: use pd.Series(...).value_counts() instead of pd.value_counts(...)#213

Fix pandas3 compat: use pd.Series(...).value_counts() instead of pd.value_counts(...)#213
david-cortes-intel merged 1 commit into
IntelPython:mainfrom
cakedev0:fix/compat_with_pandas3

cakedev0 commented May 26, 2026

Uh oh!

cakedev0 commented May 26, 2026 •

edited

Loading

Uh oh!

david-cortes-intel left a comment

Uh oh!

Uh oh!

cakedev0 commented May 27, 2026

Uh oh!

david-cortes-intel commented May 27, 2026

Uh oh!

cakedev0 commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cakedev0 commented May 26, 2026

Description

Uh oh!

cakedev0 commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-cortes-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cakedev0 commented May 27, 2026

Uh oh!

david-cortes-intel commented May 27, 2026

Uh oh!

cakedev0 commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cakedev0 commented May 26, 2026 •

edited

Loading