degree optimization#1641
Conversation
Benchmark results -- updatedMethodology. Two engines: pandas (single-node) and cuDF (single GPU, RAPIDS 26.02). 5 iterations per shape after warmup, median reported. Memory via Pandas
cuDF
Standalone helpers
Pattern
|
|
|
||
| if _safe_len(g._edges) == 0: | ||
| nodes_df = g_nodes._nodes | ||
| for c in (degree_in, degree_out, col): |
There was a problem hiding this comment.
Can collapse the assign, moving loop to inside
This helps in turn cut the layers of DFs that pandas makes
There was a problem hiding this comment.
thanks for suggestion, done
|
@mj3cheun for typical case we care about, it would be cudf to benchmark, not pd, right? |
|
yup cudf |
|
thanks for pushing back, although the structural changes were good there was a performance regression in cudf (and also in pandas tho not noticeable) due to choice of method used in this has been fixed and benchmark results above updated. both the pandas and cudf results are greatly improved |
lmeyerov
left a comment
There was a problem hiding this comment.
Awesome , tx
Approved
If I was going to be paranoid, I might ask Claude to do test amplification around alias issues and indexes
Summary
Refactor
get_degrees/get_indegrees/get_outdegreesto reduce wide-nodes-frame passes and eliminate theget_outdegreesrename-trick. Targeted at large workloads where peak memory and distributed shuffle counts dominate, but improvements hold at all scales. Made using learnings from graphistry efforts.Motivation
get_degreespreviously composedget_indegrees().get_outdegrees(), each of which:groupbyon edges to produce a small per-node aggregate, andThat's two wide merges per call.
get_outdegreesadditionally built a throwawayPlottableover a renamed-columns edges copy.At billion-row scale, the wide merge is the dominant cost (memory and distributed shuffle). Cutting two wide merges to one is the structural win.
Approach
Introduce
_degree_agg(edges, key_col, out_name, node_id)returning a small(node_id, count)frame. Then:get_indegrees(col)andget_outdegrees(col)route through a shared_single_direction_degree(key_col, col)helper.get_outdegreesno longer renames+rebuilds aPlottable; it groups by_sourcedirectly.get_degreescalls_degree_aggtwice to produce |V|-row aggregates, outer-merges them small × small, computes thedegreecolumn on the narrow frame, then performs a single merge into the nodes frame.Net change per
get_degreescall: two wide merges → one. Ondask_cudf, distributed shuffles 2 → 1.Behavior changes
Two intentional, both narrow:
get_indegreesused.agg({src: "count"})(non-null count); now uses.size()(row count). An edge(null → b)now contributes 1 tob.degree_in(was 0). Symmetric forget_outdegrees. This aligns with the graph-theoretic definition of degree and fixes a latent inconsistency:bcould appear as a materialized node yet reportdegree_in == 0despite having a row in the edges table.src/dstis null and the other is valid.get_outdegreesrow order. Now returns nodes in naturalmaterialize_nodesorder (consistent withget_indegreesandget_degrees). Master returned them in a reversed order that was an artifact of the rename trick. No documented contract pinned the old order.Files
graphistry/compute/ComputeMixin.py— refactor (+~60 / -38 net)graphistry/tests/test_compute.py— updatetest_degrees_outto natural ordering; addtest_degrees_with_null_endpointregression test for the null-endpoint counting changeCHANGELOG.md— entry under Development → ChangedRisk / caveats
get_indegrees/get_outdegreescan regress modestly in a narrow regime: very sparse graphs (|V| ≫ |E|) with very few node columns.get_degreesitself stays faster in that regime; the standalone-helper regression is bounded constant-factor and disappears as soon as edges or node columns scale up.dask_cudfnumbers unmeasured. Local benchmark is pandas single-node. The shuffle-reduction win should be the largest gain in production but is not measured here.Test plan
graphistry/tests/test_compute.py— passes (including updatedtest_degrees_outand newtest_degrees_with_null_endpoint)graphistry/tests/compute/test_get_degrees_cudf.py— passes (cuDF tests skipped locally; need TEST_CUDF=1 in CI to verify GPU path)graphistry/tests/compute/test_id_column_restriction.py— passes (custom column name coverage)dask_cudfintegration — relies on existingsafe_mergerouting; no path-specific tests