Skip to content

Prevent threaded Dask workers in DistRDF backend#22393

Open
JAGANNATHANJP wants to merge 5 commits into
root-project:masterfrom
JAGANNATHANJP:fix-dask-thread-validation
Open

Prevent threaded Dask workers in DistRDF backend#22393
JAGANNATHANJP wants to merge 5 commits into
root-project:masterfrom
JAGANNATHANJP:fix-dask-thread-validation

Conversation

@JAGANNATHANJP

Copy link
Copy Markdown

This PR prevents unsupported threaded Dask workers in DistRDF.

Distributed RDataFrame with Dask threads may lead to crashes and does not provide advantages due to Python GIL limitations. This change validates worker configuration at backend initialization and raises a RuntimeError when threaded workers are detected.

Suggested configuration:

  • processes=True
  • threads_per_worker=1

@github-actions

Copy link
Copy Markdown

Test Results

    22 files      22 suites   3d 12h 10m 36s ⏱️
 3 862 tests  3 862 ✅ 0 💤 0 ❌
76 200 runs  76 200 ✅ 0 💤 0 ❌

Results for commit 5bb11cb.

@JAGANNATHANJP JAGANNATHANJP force-pushed the fix-dask-thread-validation branch from 5bb11cb to edb506a Compare May 25, 2026 06:37
self.client = (daskclient if daskclient is not None else
Client(LocalCluster(n_workers=os.cpu_count(), threads_per_worker=1, processes=True)))

workers = self.client.scheduler_info()["workers"]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
workers = self.client.scheduler_info()["workers"]
workers = self.client.scheduler_info().get("workers", None)
if workers is None:
return

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I updated the code to safely handle the case where scheduler_info() does not contain worker information yet.

@JAGANNATHANJP JAGANNATHANJP force-pushed the fix-dask-thread-validation branch 2 times, most recently from e938571 to 2eda199 Compare May 27, 2026 08:50
@JAGANNATHANJP JAGANNATHANJP force-pushed the fix-dask-thread-validation branch from 2eda199 to 0c91ac5 Compare May 27, 2026 08:54
@JAGANNATHANJP JAGANNATHANJP requested a review from vepadulano May 28, 2026 09:47

@vepadulano vepadulano left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @JAGANNATHANJP ! Before proceeding with the CI, I would like to request you to add a test for this new behaviour, perhaps in roottest/python/distrdf/backends/check_backend.py

@JAGANNATHANJP JAGANNATHANJP requested a review from dpiparo as a code owner June 1, 2026 16:42
@JAGANNATHANJP

Copy link
Copy Markdown
Author

Added a test covering the case where scheduler_info() does not provide worker information during Dask backend initialization.

@JAGANNATHANJP JAGANNATHANJP requested a review from vepadulano June 1, 2026 16:46
Comment on lines +63 to +81
"""
Check that DaskBackend initialization succeeds when scheduler_info
does not provide worker information.
"""
connection, backend = payload

if backend != "dask":
return

from ROOT._distrdf.Backends.Dask import Backend

original_scheduler_info = connection.scheduler_info

try:
connection.scheduler_info = lambda: {}
backend = Backend.DaskBackend(daskclient=connection)
assert backend.client is connection
finally:
connection.scheduler_info = original_scheduler_info

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting test, but I would appreciate if you added also some actual computation so we can check that on top of having a valid connection, RDataFrame can also still work with it.

Afterwards, there needs to be another test, namely that checks the new RuntimeError introduced with this PR

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a distributed RDataFrame computation to the missing-workers test to verify that execution still works when scheduler_info() does not provide worker information. Also added a new test that checks the expected RuntimeError is raised when using threaded Dask workers.

@JAGANNATHANJP JAGANNATHANJP requested a review from vepadulano June 12, 2026 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants