Skip to content

interp: avoid repeated object clones on partial stores#5368

Open
jakebailey wants to merge 3 commits into
tinygo-org:devfrom
jakebailey:interp-memory-cow
Open

interp: avoid repeated object clones on partial stores#5368
jakebailey wants to merge 3 commits into
tinygo-org:devfrom
jakebailey:interp-memory-cow

Conversation

@jakebailey
Copy link
Copy Markdown
Member

@jakebailey jakebailey commented May 5, 2026

memoryView.store cloned the entire destination object on every partial store. tsgo imports golang.org/x/text/collate and hits this badly. The generated tables are initialized one element at a time, and each element store copied the whole global, making interp quadratic in the table size.

By avoiding this copy until needed, we can greatly improve the perf.

Wall User CPU Peak RSS
dev 6:09.06 2280.14 s 12.44 GiB
This PR 3:52.46 430.69 s 10.79 GiB

https://jakebailey.dev/how-much-faster/#old=369&new=232 1.59x faster 😄

Fixes #2083

memoryView.store cloned the entire destination object on every partial
store. Compiling a program that pulls in golang.org/x/text/collate hit
this hard: its generated tables are initialized one element at a time,
and each element store copied the whole global, making interp quadratic
in the table size.

Switch to copy-on-first-write per memory view: clone the object the
first time this view writes to it, then mutate that private copy in
place on later partial stores. Rollback is unaffected because revert
still discards the view's objects wholesale.

Two extra safety tweaks fall out of this:

  - Full overwrites now clone the incoming value, so the stored buffer
    can never alias state held by the caller.
  - Partial in-place stores snapshot the source buffer, so a store
    whose source is a load from the same object (overlapping or not)
    still sees the pre-store bytes.

Add an interp golden test exercising both the overlapping load/store
case and a cross-object aliased load/store case.
@jakebailey jakebailey requested a review from dgryski May 5, 2026 05:50
@jakebailey
Copy link
Copy Markdown
Member Author

jakebailey commented May 5, 2026

I mistakenly benchmarked on non-wasip1 but the effect is about the same in absolute time (Wasm just has extra steps I have optimized in other places but not on this branch; other PRs coming)

@dgryski
Copy link
Copy Markdown
Member

dgryski commented May 5, 2026

Realizing I don't know enough about interp here.
/cc @niaow @aykevl

@jakebailey
Copy link
Copy Markdown
Member Author

This also fixes #2083!

n dev (s) PR (s)
5,000 5.35 5.21
15,000 5.53 5.23
30,000 6.60 5.35
50,000 9.52 5.30
100,000 26.86 5.36
200,000 87.94 5.98

@dgryski
Copy link
Copy Markdown
Member

dgryski commented May 11, 2026

This also fixes an internal build we were having trouble with \o/

@dgryski
Copy link
Copy Markdown
Member

dgryski commented May 11, 2026

Test corpus failure (passes on dev):

~/go/src/github.com/cloudflare/bn256 $ tinygo test -tags="generic"
--- FAIL: TestG1Marshal (0.00s)
    bn256: malformed point
    FailNow is incomplete, requires runtime.Goexit()
--- FAIL: TestG2Marshal (0.00s)
    bn256: malformed point
    FailNow is incomplete, requires runtime.Goexit()
--- FAIL: TestBilinearity (0.12s)
    bad pairing result: bn256.GT(((68dce7c99407483c76cb1bf3bb03f68e774d8b4fbc9f0331666568fdd3f05e81, 769fc3874b83423cb70eb8b3944b1347b492f69a33c8019fa134ccb07c644eac), (236eba54528e5465382d345b8b902e6e931237c39b6b41562087a010977392eb, 8f06e7f42668c20b39fa5bdb9bf67956a3e53c9fd47ad2a8864d85ca2a82f16d), (895b2804b0f7df8fac0df1c4a7c72dc0027975ad4e9a922644a0aa2138207bf4, 353776a509896a6fd67d25d1371237f467d2842da9560c3f9be0cb82decd143a)),((750b5fa25d469acf70a24b36703cd2a2aecea8dfcb90c84ef616a54b23e9a30e, 2836aad94952d1cc78d5ac382095027e9f898805271f91f20b664f7a70ceeba3), (4adf959be15490d6e9f25fe5a9d0f0e901a9df3a8c0a1f8113095cfeb734c1f4, 05095ed820a5295f216ac4a15ab6384abcb7f703d0840275be00e7d06a3f4e2f), (7b5911518af9d6c2988faea27092dcab9cffeed106a9b29b2bb3e3278a73c45b, 3d22ec67ce9b09d03257bd687a27d2b3941b8b74f186f78660c4d57e7d459685)))
    FailNow is incomplete, requires runtime.Goexit()
    bad pairing result: bn256.GT(((405746fb787b3c1245b8dc989961764b957ea34ea298591c38a508cbe78edb7b, 4034e40a4e606d214519f41b95d7fd53ce26e706e460a92460aca52233debc1b), (6bc09600d659d0aeeda8ddc939327119bd691c416f5bdf01b3b4ac188c910f3e, 2443e00eeae6d91e0e1e57acfaf1787b5f6548e41471c3ef8574a6b5a1fe93d7), (009c12f056dea0dd26fbebb8bbff14934a77299de9acc194030c1a6b2bc91835, 07089f59fbc8593e18601cfbf395a1772acb45adcc954524bc978d965f51a4b4)),((59ffe78f6ecee0193ab2a39b5e69edf00aba4971c03132cf34540dfa7d1f4acf, 000c3b313fcfb1297dfd5fbe7435b77e49dde9f32372ea68fa77ebd7662139a1), (33ad0d7d30bf1d8b5dcc8015acc0f1ad92d0e3d69be2802abfdf73ee08405eb6, 745e794d081d2ce32b306163f17b87959be0b5aed103b2f3fd44258dd795165d), (62feb2204769f84c845a5f8b4d6a98c81df4156c8b8d2663f91676252ca61d16, 5b477ea70609b7f27c026ebfb76c3e4b7a569dbcba19c5e3d37bb4073f878aef)))
    FailNow is incomplete, requires runtime.Goexit()
--- FAIL: TestTripartiteDiffieHellman (0.09s)
    keys didn't agree
--- FAIL: TestDirtyUnmarshal (0.00s)
    bn256: malformed point
    FailNow is incomplete, requires runtime.Goexit()
--- FAIL: TestKnownHashes (0.01s)
    hash doesn't match a known value
    FailNow is incomplete, requires runtime.Goexit()
    hash doesn't match a known value
    FailNow is incomplete, requires runtime.Goexit()
    hash doesn't match a known value
    FailNow is incomplete, requires runtime.Goexit()
    hash doesn't match a known value
    FailNow is incomplete, requires runtime.Goexit()
    hash doesn't match a known value
    FailNow is incomplete, requires runtime.Goexit()
    hash doesn't match a known value
    FailNow is incomplete, requires runtime.Goexit()
    hash doesn't match a known value
    FailNow is incomplete, requires runtime.Goexit()
    hash doesn't match a known value
    FailNow is incomplete, requires runtime.Goexit()
    hash doesn't match a known value
    FailNow is incomplete, requires runtime.Goexit()
    hash doesn't match a known value
    FailNow is incomplete, requires runtime.Goexit()
    hash doesn't match a known value
    FailNow is incomplete, requires runtime.Goexit()
FAIL
FAIL	github.com/cloudflare/bn256	1.002s

@jakebailey jakebailey marked this pull request as draft May 11, 2026 20:20
@jakebailey
Copy link
Copy Markdown
Member Author

Bummer, I will look into this when I have time

Adds a third init function to the store testdata: an initial partial
store puts the buffer into the current memory view, then a partial load
is followed by another partial store at the same offset, and finally the
loaded value is written to a separate global.

The expected output captures the current (incorrect) behavior of the
in-place partial store optimization: the loaded value gets corrupted by
the subsequent in-place store. The next commit fixes this.
The optimization to skip cloning the destination object on partial
stores when the buffer is already owned by the current memory view
mutated obj.buffer.buf in place. The previous v.buf clone only handled
the case where the store value itself aliased that buffer; it did not
cover the more common case where an earlier partial load had returned
a slice into the same buffer. The in-place store would then corrupt
the previously loaded value, breaking code such as cloudflare/bn256
where the gfP arithmetic loads, computes, and stores within the same
global during package initialization.

Fix this in load instead: when the source object is owned by the
current memory view, copy the loaded slice so the returned value is
independent of the live buffer. The v.buf clone in store is no longer
needed and is removed.

Updates the regression test added in the previous commit to reflect
the corrected behavior.
@jakebailey jakebailey marked this pull request as ready for review May 11, 2026 20:49
@jakebailey
Copy link
Copy Markdown
Member Author

"when I have time" is apparently now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

interp rawValue clone gives quadratic behaviour compiling large arrays

2 participants