Skip to content

degree optimization#1641

Open
mj3cheun wants to merge 6 commits into
masterfrom
dev/degree-optimization
Open

degree optimization#1641
mj3cheun wants to merge 6 commits into
masterfrom
dev/degree-optimization

Conversation

@mj3cheun
Copy link
Copy Markdown
Contributor

@mj3cheun mj3cheun commented May 25, 2026

Summary

Refactor get_degrees / get_indegrees / get_outdegrees to reduce wide-nodes-frame passes and eliminate the get_outdegrees rename-trick. Targeted at large workloads where peak memory and distributed shuffle counts dominate, but improvements hold at all scales. Made using learnings from graphistry efforts.

Motivation

get_degrees previously composed get_indegrees().get_outdegrees(), each of which:

  • did a groupby on edges to produce a small per-node aggregate, and
  • merged that aggregate into the wide nodes frame.

That's two wide merges per call. get_outdegrees additionally built a throwaway Plottable over a renamed-columns edges copy.

At billion-row scale, the wide merge is the dominant cost (memory and distributed shuffle). Cutting two wide merges to one is the structural win.

Approach

Introduce _degree_agg(edges, key_col, out_name, node_id) returning a small (node_id, count) frame. Then:

  • get_indegrees(col) and get_outdegrees(col) route through a shared _single_direction_degree(key_col, col) helper. get_outdegrees no longer renames+rebuilds a Plottable; it groups by _source directly.
  • get_degrees calls _degree_agg twice to produce |V|-row aggregates, outer-merges them small × small, computes the degree column on the narrow frame, then performs a single merge into the nodes frame.

Net change per get_degrees call: two wide merges → one. On dask_cudf, distributed shuffles 2 → 1.

Behavior changes

Two intentional, both narrow:

  1. Null-endpoint counting (bug fix). Old get_indegrees used .agg({src: "count"}) (non-null count); now uses .size() (row count). An edge (null → b) now contributes 1 to b.degree_in (was 0). Symmetric for get_outdegrees. This aligns with the graph-theoretic definition of degree and fixes a latent inconsistency: b could appear as a materialized node yet report degree_in == 0 despite having a row in the edges table.
    • Only affects datasets with rows where exactly one of src/dst is null and the other is valid.
  2. get_outdegrees row order. Now returns nodes in natural materialize_nodes order (consistent with get_indegrees and get_degrees). Master returned them in a reversed order that was an artifact of the rename trick. No documented contract pinned the old order.

Files

  • graphistry/compute/ComputeMixin.py — refactor (+~60 / -38 net)
  • graphistry/tests/test_compute.py — update test_degrees_out to natural ordering; add test_degrees_with_null_endpoint regression test for the null-endpoint counting change
  • CHANGELOG.md — entry under Development → Changed

Risk / caveats

  • Standalone get_indegrees / get_outdegrees can regress modestly in a narrow regime: very sparse graphs (|V| ≫ |E|) with very few node columns. get_degrees itself stays faster in that regime; the standalone-helper regression is bounded constant-factor and disappears as soon as edges or node columns scale up.
  • dask_cudf numbers unmeasured. Local benchmark is pandas single-node. The shuffle-reduction win should be the largest gain in production but is not measured here.
  • Null-endpoint behavior change is a fix, not a configurable flag. Datasets with single-null-endpoint edges will see degree numbers shift upward. Likely surfaces a latent bug rather than breaking an intentional contract.

Test plan

  • graphistry/tests/test_compute.py — passes (including updated test_degrees_out and new test_degrees_with_null_endpoint)
  • graphistry/tests/compute/test_get_degrees_cudf.py — passes (cuDF tests skipped locally; need TEST_CUDF=1 in CI to verify GPU path)
  • graphistry/tests/compute/test_id_column_restriction.py — passes (custom column name coverage)
  • dask_cudf integration — relies on existing safe_merge routing; no path-specific tests
  • Manual: re-run benchmark on a cuDF / dask_cudf billion-row dataset before claiming production magnitudes

@mj3cheun mj3cheun changed the title Dev/degree optimization degree optimization May 25, 2026
@mj3cheun
Copy link
Copy Markdown
Contributor Author

mj3cheun commented May 25, 2026

Benchmark results -- updated

Methodology. Two engines: pandas (single-node) and cuDF (single GPU, RAPIDS 26.02). 5 iterations per shape after warmup, median reported. Memory via tracemalloc (pandas) and rmm.statistics.peak_bytes (cuDF). Synthetic graphs with uniform random edges. Output equivalence verified (modulo row order) before timing every shape.

Pandas get_degrees

Shape Time (new / old) Peak memory (new / old)
100K V × 1M E, 4 cols 1.94x faster 3.84x lower
100K V × 1M E, 20 cols (wide nodes) 1.76x faster 1.70x lower
500K V × 2M E, 4 cols 1.12–1.23x faster 1.20x lower
1M V × 100K E, 4 cols (sparse) 1.04x faster 1.11x lower
100K V × 5M E, 4 cols (edge-heavy) 2.17x faster 1.79x lower

cuDF get_degrees (10M+ scale, RTX-class GPU)

Shape Time (new / old) Peak memory (new / old)
1M V × 10M E, 4 cols 0.96x (par) 1.78x lower
1M V × 10M E, 20 cols 1.01x 2.07x lower
5M V × 25M E, 20 cols 1.03x 1.78x lower
10M V × 50M E, 4 cols 1.01x 1.77x lower
10M V × 50M E, 20 cols 1.03x 1.78x lower
10M V × 100M E, 4 cols 1.02x 1.78x lower

Standalone helpers

get_outdegrees improves consistently on both engines (the rename-trick Plottable rebuild is gone): pandas 1.20–2.39x faster + 1.27–3.17x lower memory; cuDF parity time + 1.77x lower memory.

get_indegrees is essentially equivalent to master on cuDF (within ±5% time, memory at parity). On pandas it regresses ~25–30% in the sparse narrow regime (1M V × 100K E), bounded constant-factor — get_degrees itself stays faster on the same shape.

Pattern

  • Time gains correlate with edge count. Edge-heavy shapes (5M edges) hit 2x+ on pandas. Sparse shapes track at parity.
  • Memory gains correlate with node-frame width. Wide nodes (20 cols) reach 1.70x–2.07x lower across both engines.
  • cuDF time is at parity because GPU groupby/merge are bandwidth-bound, not allocation-bound; the structural fix shows up as memory savings rather than wall-clock.
  • value_counts() is more efficient than groupby().size() on both engines for this per-direction aggregation — using it widened the wins across the board and eliminated a cuDF-specific get_indegrees memory regression seen with .size().

@mj3cheun mj3cheun requested a review from lmeyerov May 25, 2026 20:47
Comment thread graphistry/compute/ComputeMixin.py Outdated

if _safe_len(g._edges) == 0:
nodes_df = g_nodes._nodes
for c in (degree_in, degree_out, col):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can collapse the assign, moving loop to inside

This helps in turn cut the layers of DFs that pandas makes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for suggestion, done

@lmeyerov
Copy link
Copy Markdown
Contributor

@mj3cheun for typical case we care about, it would be cudf to benchmark, not pd, right?

@mj3cheun
Copy link
Copy Markdown
Contributor Author

yup cudf

@mj3cheun
Copy link
Copy Markdown
Contributor Author

thanks for pushing back, although the structural changes were good there was a performance regression in cudf (and also in pandas tho not noticeable) due to choice of method used in _degree_agg

this has been fixed and benchmark results above updated. both the pandas and cudf results are greatly improved

Copy link
Copy Markdown
Contributor

@lmeyerov lmeyerov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome , tx

Approved

If I was going to be paranoid, I might ask Claude to do test amplification around alias issues and indexes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants