Skip to content

Finish and benchmark inline/mem2reg improvements.#552

Merged
eddyb merged 13 commits intomainfrom
eddyb/faster-inline-mem2reg
Apr 11, 2026
Merged

Finish and benchmark inline/mem2reg improvements.#552
eddyb merged 13 commits intomainfrom
eddyb/faster-inline-mem2reg

Conversation

@eddyb
Copy link
Copy Markdown
Member

@eddyb eddyb commented Apr 7, 2026

Background for these inliner and mem2reg changes:


In order to properly assess the impact of both already-landed changes, and newer ones,
this PR reverts commits from #21, for a "current main minus any improvements" baseline,
on top of which all changes are reapplied, allowing each step to be measured.
(this also results in a tiny effective diff, since most commits form revert+reapply pairs)

Sadly, @schell's renderling couldn't be included (due to indirect dependencies on specific glam versions), but the other old samples were simple enough to allow automating compatibility (with both Rust-GPU 0.9 and current main), and the measurement commands and raw output are available in a gist.


Commit Description rene@73e827b
inline/mem2reg
rust-gpu-shadertoys@81a56fd
inline/mem2reg
ab-proof-of-space-gpu@fc99af15
inline/mem2reg
c2f98b6 Last Rust-GPU release (v0.9) 1.4s/317.7s 15.7s/2747.7s
5556a28 main w/o inline/mem2reg changes
(i.e. after all 5 reverts)
1.6s/283.3s 15.6s/1949.2s 0.9s/287.4s
5ae5fc7 ↑+ inliner debuginfo fixes 8.7s/307.3s 97.6s/2220.4s 2.7s/298.8s
99d34af ↑+ bottom-up inliner 0.09s/49.9s 0.4s/622.5s 0.9s/213.9s
ccc08a2 ↑+ mem2reg label ID indexing 0.09s/42.4s 0.4s/312.5s 0.9s/64.7s
1e02928 ↑+ use phis to inline returns,
instead of var+store+load

[commit equivalent to main]
0.06s/3.9s 0.3s/3.9s 0.9s/4.8s
03d77ca ↑+ apply rewrite rules
less often in mem2reg
0.07s/1.8s 0.4s/2.4s 0.9s/2.8s
1f18857 ↑+ remove_duplicate_debuginfo during inlining 0.04s/1.2s 0.2s/1.7s 1s/2.7s
096fdcb ↑+ mem2reg during inlining 0.07s 0.7s 2.5s

While the changes with the largest impact have already landed in Rust-GPU, the last 3 commits still result in a combined 2-6x reduction in inlining+mem2reg times for the tested shaders (except for rene, where it's more like a 50x reduction).

(and that's without being able to reproduce #546 - until then, it's unclear how much of #547 is subsumed by the last 3 changes in this PR)

EDIT: see #552 (comment) and #547 (comment)

@LegNeato
Copy link
Copy Markdown
Collaborator

LegNeato commented Apr 7, 2026

I rebased this on current main. I also did a perf run with this and this PR + cherry-picked #547. It is generally neutral overall, so this PR ate most of the benefit of that.

Details

Setup

  • Baseline: c665c62c5d (this PR)
  • Comparison: b92b3359ce (this PR + cherry-picked #547)

Median End-to-End Summary

workload baseline total median (s) +#547 total median (s) delta
rene/rene-shader 2.937 2.790 -5.0%
rust-gpu-shadertoys/shaders 19.124 19.303 +0.9%
abundance/crates/farmer/ab-proof-of-space-gpu 26.206 26.122 -0.3%

Raw Results

rene/rene-shader

metric baseline median +#547 median median delta mean delta
link_inline 0.070 0.066 -5.7% -9.9%
link_block_ordering_pass_and_mem2reg-after-inlining 0.018 0.016 -11.1% -16.0%
link 2.754 2.495 -9.4% -3.9%
total 2.937 2.790 -5.0% -3.3%

rust-gpu-shadertoys/shaders

metric baseline median +#547 median median delta mean delta
link_inline 0.686 0.430 -37.3% -27.5%
link_block_ordering_pass_and_mem2reg-after-inlining 0.097 0.040 -58.8% -51.2%
link 18.506 18.680 +0.9% +4.0%
total 19.124 19.303 +0.9% +3.7%

abundance/crates/farmer/ab-proof-of-space-gpu

metric baseline median +#547 median median delta mean delta
link_inline 2.342 2.176 -7.1% -10.1%
link_block_ordering_pass_and_mem2reg-after-inlining 0.293 0.244 -16.7% -18.0%
link 25.991 25.905 -0.3% +8.9%
total 26.206 26.122 -0.3% +8.8%

Copy link
Copy Markdown
Member

@Firestar99 Firestar99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one question:

In the last commit f842afe mem2reg during inlining, you're measurements only have one value as if moved mem2reg to inlining, but you just added mem2reg to inlining without removing it whereever it was called previously.
So it should still have a (potentially small) runtime, or can it be removed from where it's called previously?

@eddyb
Copy link
Copy Markdown
Member Author

eddyb commented Apr 8, 2026

I rebased this on current main. I also did a perf run ... It is generally neutral overall, so this PR ate most of the benefit of that.

Thanks, that's good to know. When it comes to rebasing, I will need to force-push a version that locally has the right commit identity for jj, sorry in advance (but I won't do that before updating the numbers).


In the last commit f842afe mem2reg during inlining, you're measurements only have one value as if moved mem2reg to inlining, but you just added mem2reg to inlining without removing it whereever it was called previously. So it should still have a (potentially small) runtime, or can it be removed from where it's called previously?

In theory the separate invocations of mem2reg might have become redundant, but I didn't want to risk them being skipped in case e.g. the inliner only runs mem2reg if inlining actually happened.

I could modify the separate mem2reg invocation to assert that nothing changes (i.e. we're only paying for a redundant rescanning), and if that never gets hit in anything we have access to, we can remove it.
(I was mainly trying to get big wins from small diffs, with minimal/zero risk of breakage)


Also see #547 (comment) for an update from @nazar-pc on the #546 mystery.
(I might re-measure on top of the Rust-GPU commit they were on, now that I have the automation for it)

@LegNeato
Copy link
Copy Markdown
Collaborator

LegNeato commented Apr 8, 2026

Sure, force-push away. I guess I should probably switch to jj.

@eddyb eddyb marked this pull request as ready for review April 11, 2026 14:39
@eddyb eddyb requested a review from LegNeato as a code owner April 11, 2026 14:39
@eddyb eddyb force-pushed the eddyb/faster-inline-mem2reg branch from 0ae3cd0 to 096fdcb Compare April 11, 2026 14:41
@eddyb eddyb enabled auto-merge April 11, 2026 14:42
@eddyb eddyb added this pull request to the merge queue Apr 11, 2026
Merged via the queue into main with commit a874c1b Apr 11, 2026
23 checks passed
@eddyb eddyb deleted the eddyb/faster-inline-mem2reg branch April 11, 2026 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants