Fix pandas3 compat: use pd.Series(...).value_counts() instead of pd.value_counts(...)#213
Conversation
Signed-off-by: Arthur Lacote <arthur.lacote@probabl.ai>
|
I'm not sure why the CI is red but I think it's unrelated to my changes, so I'm marking this PR as ready for review. |
david-cortes-intel
left a comment
There was a problem hiding this comment.
LGTM. CI errors are unrelated to this change.
But note that there's still other compatibility issues between the latest openml package and pandas>=3 which lead to errors:
File "/localdisk2/mkl/dcortes/repos/scikit-learn_bench/sklbench/datasets/downloaders.py", line 99, in fetch_and_correct_openml
x, y, _, _ = dataset.get_data(
^^^^^^^^^^^^^^^^^
File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/openml/datasets/dataset.py", line 820, in get_data
data, categorical, attribute_names = self._load_data()
^^^^^^^^^^^^^^^^^
File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/openml/datasets/dataset.py", line 611, in _load_data
return self._cache_compressed_file_from_file(Path(file_to_load))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/openml/datasets/dataset.py", line 556, in _cache_compressed_file_from_file
attribute_names, categorical, data = self._parse_data_from_file(data_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/openml/datasets/dataset.py", line 584, in _parse_data_from_file
data, categorical, attribute_names = self._parse_data_from_arff(data_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/openml/datasets/dataset.py", line 472, in _parse_data_from_arff
pd.factorize(type_)[0]
^^^^^^^^^^^^^^^^^^^
File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/pandas/core/algorithms.py", line 791, in factorize
values = _ensure_arraylike(values, func_name="factorize")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/localdisk2/mkl/dcortes/miniforge3/envs/sklbench12/lib/python3.12/site-packages/pandas/core/algorithms.py", line 239, in _ensure_arraylike
raise TypeError(
TypeError: factorize requires a Series, Index, ExtensionArray, np.ndarray or NumpyExtensionArray got list.
(CC @avolkov-intel )
I don't see those. Do you have a config that reproduces it? Note: I'm using pixi and I'm installing openml from pypi. |
It could be triggered like this, if you are interested in contributing to OpenML: python -m sklbench --config configs/regular/ensemble.json --result-file try.json --filters algorithm:library=sklearn algorithm:device=cpu --prefetchThe |
|
Thanks, I will use pandas 2 then. |
Description
pandas 3 stopped exposing
pd.value_countsso a couple of places were breaking. I changed to usingpd.Series(...).value_counts()instead.Checklist:
Completeness and readability
Testing