Skip to content

Support for Dask, PySpark, and Ray via Fugue#328

Merged
AzulGarza merged 17 commits intomainfrom
feat/anydataframe
Apr 20, 2026
Merged

Support for Dask, PySpark, and Ray via Fugue#328
AzulGarza merged 17 commits intomainfrom
feat/anydataframe

Conversation

@spolisar
Copy link
Copy Markdown
Contributor

@spolisar spolisar commented Mar 19, 2026

Add support for working with distributed dataframes via Fugue. Supports Dask, PySpark, and Ray.

A repartitioning workaround is used for an issue with single-partition dask dataframes resulting in an output dataframe that throws a ValueError: Cannot repartition on divisions with unknown divisions when trying to compute() or repartition it.

The ci run of distributed tests is limited to a single worker to address an "out of memory" issue.

TODO:

  • add examples to mkdocs.yml

@spolisar spolisar marked this pull request as ready for review March 20, 2026 18:07
@spolisar spolisar requested a review from AzulGarza March 20, 2026 18:07
Copy link
Copy Markdown
Member

@AzulGarza AzulGarza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @spolisar! lgtm:)

@AzulGarza AzulGarza merged commit 190d735 into main Apr 20, 2026
10 checks passed
@AzulGarza AzulGarza deleted the feat/anydataframe branch April 20, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants