Skip to content

Define a new "wasm-multivalue" calling convention#268

Open
alexcrichton wants to merge 2 commits into
WebAssembly:mainfrom
alexcrichton:wasm-multivalue-calling-convention
Open

Define a new "wasm-multivalue" calling convention#268
alexcrichton wants to merge 2 commits into
WebAssembly:mainfrom
alexcrichton:wasm-multivalue-calling-convention

Conversation

@alexcrichton
Copy link
Copy Markdown
Collaborator

This commit is an attempt to define a new calling convention for WebAssembly which I've provisionally decided to call "wasm-multivalue". This new calling convention is inspired by discussions on #247 and #88, and the primary motivation of this calling convention is to be able to use multi-return signatures for intrinsics/functions in the component model.

This is a definition of a new, parallel, calling convention to the existing, now called "C", calling convention. The intention is that this avoids breakage to existing programs. The new calling convention is opt-in, requiring an annotation. The multivalue target feature enables usage of "wasm-multivalue", but enabling or disabling multivalue-the-target-feature has no effect on the "C" calling convention.

The technical definition of this new "wasm-multivalue" calling convention is to alter the previous calling convention in three ways:

  • Primarily, struct returns are now returned directly if all fields are (optionally recursively) scalars. This means that returning a struct-by-value gets translated to multiple return values.

  • To account for Pass small structs in parameters instead of memory #88 this new calling convention additionally passes two-scalar-argument struct definitions directly instead of indirectly, matching what native calling conventions do for example.

  • Finally, __int128 and long double return values are now returned directly instead of indirectly.

On an implementation side of things, what I'd like to do is to gain consensus on this calling convention definition in the tool-conventions repository as a first step. Once the definition is settled I plan to go to LLVM/Clang to implement this calling convention, and after that I plan to go to the Rust compiler and implement the calling convention there. My plan for Rust is to implement extern "wasm-multivalue" fn(..) syntactically, and for C to use
__attribute__((wasm_multivalue)) as the opt-in. Note that this is all bikesheddable, of course.

This commit is an attempt to define a new calling convention for
WebAssembly which I've provisionally decided to call "wasm-multivalue".
This new calling convention is inspired by discussions on WebAssembly#247 and WebAssembly#88,
and the primary motivation of this calling convention is to be able to
use multi-return signatures for intrinsics/functions in the component
model.

This is a definition of a new, parallel, calling convention to the
existing, now called "C", calling convention. The intention is that this
avoids breakage to existing programs. The new calling convention is
opt-in, requiring an annotation. The `multivalue` target feature enables
usage of "wasm-multivalue", but enabling or disabling
`multivalue`-the-target-feature has no effect on the "C" calling convention.

The technical definition of this new "wasm-multivalue" calling
convention is to alter the previous calling convention in three ways:

* Primarily, `struct` returns are now returned directly if all fields
  are (optionally recursively) scalars. This means that returning a
  `struct`-by-value gets translated to multiple return values.

* To account for WebAssembly#88 this new calling convention additionally passes
  two-scalar-argument `struct` definitions directly instead of
  indirectly, matching what native calling conventions do for example.

* Finally, `__int128` and `long double` return values are now returned
  directly instead of indirectly.

On an implementation side of things, what I'd like to do is to gain
consensus on this calling convention definition in the tool-conventions
repository as a first step. Once the definition is settled I plan to go
to LLVM/Clang to implement this calling convention, and after that I
plan to go to the Rust compiler and implement the calling convention
there. My plan for Rust is to implement `extern "wasm-multivalue"
fn(..)` syntactically, and for C to use
`__attribute__((wasm_multivalue))` as the opt-in. Note that this is all
bikesheddable, of course.
@tlively
Copy link
Copy Markdown
Member

tlively commented May 24, 2026

SGTM! I like that making this opt-in lets us avoid the question of how large a struct should be allowed to be returned by value.

Comment thread BasicCABI.md Outdated
Comment thread BasicCABI.md Outdated
@alexcrichton
Copy link
Copy Markdown
Collaborator Author

I've got a concrete implementation of this at llvm/llvm-project#200076, although I'm by no means an LLVM expert so that's unlikely its final form. It's at least the general thrust, however.

@dschuff
Copy link
Copy Markdown
Member

dschuff commented May 28, 2026

This sounds good to me too. But I don't think we can/should avoid the question of picking concrete size though, right? Once this starts being used by component model intrinsics, we'll want it to be stable, no?

@dschuff
Copy link
Copy Markdown
Member

dschuff commented May 28, 2026

Oh, I guess @tlively you're saying that the convention will be to just return everything byval, and just never make this the default, and then the opt-in would have to be done by the programmer or whoever defines an API?
It seems like it might be nice to have it as the default in the future though if it's really beneficial.

One thing that we could do: The GlobalOpt pass knows how to change functions to use the FastCC calling convention when they are private enough. Maybe we can make FastCC an alias for this, or make a TTI hook to choose which CC to upgrade to instead of just using FastCC. But that might only make sense if the calling convention was stable and known to be almost always beneficial.
(Or we could have some more sophisticated analysis in GlobalOpt to decide whether it's beneficial... or just have FastCC and wasm-multivalue CC be the same except for the number of return values used... anyway there are several options, but we wouldn't need to do anything fancy if wasm-multivalue were stable and mostly an improvement).

@alexcrichton
Copy link
Copy Markdown
Collaborator Author

@dschuff to confirm you mean a limit to the number of return values, right? If so, I was wondering the same, but testing locally it looked like there was no limit to the number of parameters that would be generated so I figured I'd follow the same. I think it'd be reasonable to pick a limit, however, and start spilling to the stack afterwards. Implementing that scale of ABI change is probably going a bit beyond my LLVM abilities personally, but otherwise, yes, we'll want to consider this stable in relatively short-order to be able to use it in the component model.

I do agree that it'd probably be worthwhile to make this the default ABI one day (or at least a variant thereof). I also agree that it'd be reasonable to switch internal functions to this ABI automatically, and this is something I plan on measuring for Rust at some point and probably applying (the precise ABI of fn() is perma-unstable unlike extern "C" fn() in Rust, exactly to be able to make changes like this).

For now though my plan is to start relatively conservative and mostly just unblock the ability to use multi-return in wasm modules (and subsequently component model intrinsics/lowerings/etc). I don't have immediate plans personally to investigate optimization passes or plan out a switch for this to become the default ABI.

@fitzgen
Copy link
Copy Markdown
Contributor

fitzgen commented May 28, 2026

I think at minimum the limit would need to match the web spec’s implementation limits, no?

@alexcrichton
Copy link
Copy Markdown
Collaborator Author

Testing locally, currently there's no limit in LLVM to match the web 1000 parameter limit. IIRC historically LLVM can also emit more locals than the web allows, so I'm not sure how many preexisting limits are adhered to within LLVM.

If a limit were to be added, though, it'd probably be something like ~400 ABI-level parmeters and ~400 field structs. Half of the 1000 limit due to some types becoming two ABI values (e.g. i128) and a little less to account for things like return poitners and such being injected.

I'm happy to write this down, but I'm also a bit wary of diverging too much from the preexisting ABI. In practice I suspect we can retroactively apply limits at any time since 400 parameters is a lot and this isn't super widespread yet, too.

@tlively
Copy link
Copy Markdown
Member

tlively commented May 28, 2026

Oh, I guess @tlively you're saying that the convention will be to just return everything byval, and just never make this the default, and then the opt-in would have to be done by the programmer or whoever defines an API?
It seems like it might be nice to have it as the default in the future though if it's really beneficial.

Right. Having it be opt-in completely avoids the question of how many return values to support before spilling. I agree that it would be nice to have a default-on or FastCC multivalue ABI that only uses a reasonable, finite number of return values (2? 4? 8?), but by starting with something opt-in we can make incremental progress while continuing to defer a decision on that to the future.

@alexcrichton, another consideration is that there are several known bugs with the multivalue implementation at the moment (e.g. llvm/llvm-project#98323, llvm/llvm-project#92995). The last status here was that @sunfishcode was looking into some of them.

@dschuff
Copy link
Copy Markdown
Member

dschuff commented May 28, 2026

Yeah, previously we had discussed having an ABI that was meant to be similar to existing machine ABIs that can pass and return a limited number of elements in registers, for a performance boost compared to using the stack. The idea was that this could hopefully be implemented by also using registers for at least most of those parameters/return values, but that beyond a small number (probably less than 10, as Thomas mentioned), keeping parameters that would certainly need to be in memory as explicitly in memory made sense. The reason there is currently no limit (on the number of parameters used to pass struct elements, or the number of return values) is just that nobody has done the performance measurement work to figure out what the best number is.

It sounds like you are looking for something a little different though, i.e. that a potentially-huge number of params or returns is what you might actually want? As Thomas says, this has the nice property that nobody has to do that performance measurement (and that performance maybe isn't really the motivation for your use case anyway?).

I think I would really like us to actually just do some experiments though, and pick a small number to use as default. As I mentioned, I think it can unlock some easy performance wins for LTO and potentially as a default. We hadn't really considered the possibility of a large or unlimited number of params. It's certainly a simpler ABI, but I would want to look at the consequences (in terms of code size, generated code size, and performance) of that. I could imagine a code size blowup (pushing 100 values onto the stack at the callsites vs just passing a pointer), a performance cliff (2x copies for all those parameters? hitting some pathological case in the register allocator?); or potentially even a performance gain (having 100 values on the wasm stack instead of linear memory means no bounds checks in wasm64 mode). But we don't know.

@dschuff
Copy link
Copy Markdown
Member

dschuff commented May 28, 2026

I'm happy to write this down, but I'm also a bit wary of diverging too much from the preexisting ABI. In practice I suspect we can retroactively apply limits at any time since 400 parameters is a lot and this isn't super widespread yet, too.

Do you mean the preexisting mult-value ABI? Don't worry about divergence from that, it's not used anywhere and as Thomas mentioned, it doesn't even work yet.

@QuantumSegfault
Copy link
Copy Markdown

QuantumSegfault commented May 28, 2026

What's the rationale for limiting the parameter aggregate split to only two scalars? I understand there should be a limit, but I imagine a higher limit, say 4, would allow for better utilization of the new ABI (the example in mind is 3D math & graphics; vec3, vec4, quat, color).

@dschuff
Copy link
Copy Markdown
Member

dschuff commented May 28, 2026

If that were the limit (it isn't yet), it would be because we found empirically that it performed better. It would be interesting to know for example, how many parameters (if any?) can be passed in registers in wasm implementations and whether the builtin spilling is any worse than passing in wasm memory would be.
But again, nobody has done this experimentation yet. I agree it seems likely that we'd want at least 4 though.

@dschuff
Copy link
Copy Markdown
Member

dschuff commented May 28, 2026

Oh, sorry I missed that the current proposal and PR does actually use exactly 2 scalar arguments. I think what I said above about our old discussions still applies; I think it might make sense to have more than 2, based on performance and/or your usability argument. @alexcrichton may have just picked it based on the component models' needs, I would actually be curious if there was more to it than that.

@alexcrichton
Copy link
Copy Markdown
Collaborator Author

@tlively oh thanks for pointing out those issues! Are you or others aware of more issues than those? I can try to dig into them and see if I can't figure them out.

Otherwise though, one point worth clarifying here is that my intention is to implement this ABI in both Rust and Clang at which point the definitions need to match, so that'll at least place a constraint on the degree of flexibility we have for tweaking the ABI over time. My personal intention is to unblock the ability to use multi-value in the component model where the only requirement is some way to write a function in C/Rust that maps to a specific wasm type signature, but it's all generated code so is relatively easy to adjust/modify compared to a bunch of handwritten code in the ether.

Also to your points @dschuff, sorry no I don't mean to give the impression that I have a use case with tons of params/results, that was mostly in response to @fitzgen's thought that LLVM may want to match the JS embedder API limits for wasm which places a hard requirement that functions can't have more than 1000 parameters or 1000 results. Additionally by matching the previous ABI I mean the default one day (called "C" after this PR) which, from what I can tell, has no limit on the number of parameters.


w.r.t. speed/optimization, I think it's going to be a bit of a tricky issue. Number-of-parameters-in-registers varies across architectures/OSes which means that a benchmark on x64 linux will probably have different results than aarch64 macos will probably have different results than x64 Windows. My main goal here is to expose the ability to use multi-value returns, or effectively be able to map any wasm function type signature to some possible C/Rust function type signature, and in that sense I'd prefer to not get too bogged down in perf investigations. At the same time though I'd predict not a whole lot of ability to tweak this ABI after-the-fact, so I'd ideally like to make sure it's "right" and takes all concerns into account.

The reason there is currently no limit (on the number of parameters used to pass struct elements, or the number of return values) is just that nobody has done the performance measurement work to figure out what the best number is.

To ask a question about this: this isn't something that can easily be retroactively modified though, could it? That would amount to changing the default ABI, which I imagine has all sorts of thorny issues because it's a "break the world" moment where nothing before can be mixed with anything after.

I could imagine a code size blowup, a performance cliff; or potentially even a performance gain

For this point in particular, on one hand my goal with this new ABI is to be able to map any wasm function type signature to a C/Rust type signature. If a limit were applied to parameters or results then I'd argue it should be high enough that nothing would reasonably want to use or generate such a wasm type signature, but this limit would probably be well above the threshold of "useful for performance".

For example a semi-plausible situation is that in the component model the ABI for a particular imported function ends up using 16 return values. I don't think any native platform reasonably has support for 16 return registers, but from a code generation perspective it's still something that'd be desirable to be able to generate code for. This could of course inform the other way, though, by having stricter limits in the component model (e.g. at most 4 returns or something ilke that). So far we don't have a definition of what a multi-return ABI would be for the component model, but what we do have in the meantime is a set of functions that would be desirable to return up to 3 values (at least thinking about intrinsics off the top of my head).

Put another way, this ABI, while I think it's an improvement over the prior, may not be best served trying to be the end-all-be-all. To me there's probably 3 desirable ABIs for WebAssembly right now:

  1. The current "C" which is the default. Mostly around for historical compatibility at this point.
  2. Some ABI which is suitable for mapping any reasonable wasm function type signature into C/Rust, and that's what "wasm-multivalue" is being used for in this PR.
  3. Some ABI tuned for the best performance, placing more strict limits on params/results to solve some of the hypothetical problems you mention.

ABIs are always a tricky thing...

Do you mean the preexisting mult-value ABI? Don't worry about divergence from that

Ah, yes, indeed! I meant following in the footsteps of the "C" ABI today (with no limit on params). I agree there's no need to strictly follow the preexisting multivalue abi.


@QuantumSegfault

What's the rationale for limiting the parameter aggregate split to only two scalars?

Honestly not a ton of rationale. I tried some godbolt code and saw that on x86_64 the limit was 2 scalars being flattened to registers and decided to mirror that. I'm not opposed to upping the max. One caution though is that parameters to wasm functions are typically mapped to native ABIs/registers at the native-machine-code level of which there really aren't all that many. Flattening a 4-element vector would spill all other arguments to the native machine stack on x86_64 for example in Wasmtime (I think at least), meaning that wasm-level parameters are still memory-level parameters when executed. In that sense maximizing splitting or more aggressively splitting won't necessarily lead to more performance. I'd guess it'd help clean up the wasm code slightly, but the complications would mostly jsut be shifted to native.


It would be interesting to know for example, how many parameters (if any?) can be passed in registers in wasm implementations

I can speak to this from the Wasmtime side of things at least for integer arguments/results in registers:

platform argument registers return registers
x86_64 4 8
aarch64 6 6
riscv64 8 2
s390x 4 6

(wow we're kind of all over the place...)

That being said, for Wasmtime we have tight control over the ABI (e.g. we're not trying to match system ABIs) so we can adjust these and tweak them.

One reason I'm hesitant to place too many limits in wasm is that if someone is trying to optimize for one particular engine or one particular architecture in one engine I'd want to avoid closing off possibilities by having a least-common-denominator of sorts or something like that. That's where I'd imagine that the limits for params/results (and maybe flattening?) would be relatively high to be beyond what any native architecture can reasonably implement. As I say that out loud though, on the performance part, it's probably bad to spill to both the wasm stack and then also spill to the native stack. It's probably best to spill to one or the other, and in that situation it might make the most sense to pretty aggressively use params/results in wasm to enable the most possible optimizations in engines...


apologies for the lots-of-words, but I definitely want to acknowledge that this is not an easy problem nor something I was planning on landing super fast. Thank you everyone for chiming in on this, I very much appreciate it!

alexcrichton added a commit to alexcrichton/llvm-project that referenced this pull request May 28, 2026
This is an implementation of WebAssembly/tool-conventions#268 here in
LLVM. This adds a new calling convention to Clang, named
`wasm_multivalue`, which is intended to be used on WebAssembly targets
to configure multiple return values and slightly tweak the ABI. Changes
here are:

* Parsing/validation of `__attribute__((wasm_multivalue))`. Note that
  validation here means that it's not only well-formed but on wasm
  targets the `multivalue` target feature is additionally enabled.
* Clang-level ABI adjustments for the `wasm_multivalue` calling
  convention. These are defined by WebAssembly/tool-conventions#268 and
  notably includes expanding structs with exactly 2 scalar fields in
  parameter-position and directly returning structs with any number of
  scalar fields in return-position.
* A new `wasm_multivaluecc` keyword/calling convention for LLVM IR. This
  is what Clang lowers to when using the `wasm_multivalue` calling
  convention.
* Adjustments at the LLVM ABI layer to support returning multiple values
  with the `wasm_multivaluecc` calling convention.

My goal after this would be to start integrating this into Rust next,
under and unstable feature, and then further continue
testing/vetting/etc for component model usage.
@dschuff
Copy link
Copy Markdown
Member

dschuff commented May 29, 2026

The reason there is currently no limit (on the number of parameters used to pass struct elements, or the number of return values) is just that nobody has done the performance measurement work to figure out what the best number is.

To ask a question about this: this isn't something that can easily be retroactively modified though, could it? That would amount to changing the default ABI, which I imagine has all sorts of thorny issues because it's a "break the world" moment where nothing before can be mixed with anything after.

Sorry I think we may be talking about different things. I meant that (I think) in the current experimental-mv there's no limit on the number of struct elements (whereas the default C ABI doesn't pass any). We can change it now, since there are no users. But yeah obviously anything that becomes a default now or in the future can't easily be changed once it has.

One thing I wasn't thinking about yet is that there currently is no limit on the number of wasm argument parameters even in the default ABI. But this is less of an issue because there is just one wasm param for each C param, and there just aren't very many C functions with 1000 parameters. So nobody has complained about hitting the implementation limit. If we start destructuring structs into wasm params (or arrays in structures?) then it becomes much more likely to be an issue.

I do think it can make sense to treat the implementation incrementally though. By having an explicit opt-in less-limited (unlimited?)-destructuring ABI, we can solve a subset of the problems (plumbing new calling conventions through the layers, mixing conventions, working out LTO bugs, etc) without needing to pin down a number. We will still have to figure out things like arrays-in-structures, counting float vs int separately, and probably other stuff I'm forgetting, some of which is probably already handled in experimental-mv.
If destructuring is limited (like your proposed limit of 2 for args), there's also the question of whether we want to skip such structs completely (passing them by ref) or destructure the first 2 and pass the rest by reference (writing this out, the former sounds better, actually...).

Gemini tells me (haven't verified) that V8 uses 5 argument registers and 2 return registers on x86-64 and 6 arg / 2 return on ARM64. So an interestingly large difference in the number of return registers.
Raw intuition suggest to me that passing struct args might be more common or have a bigger impact on performance than returning structs, but that's based on no actual evidence.

@alexcrichton
Copy link
Copy Markdown
Collaborator Author

That's a good point yeah, returning a giant (possibly generated) struct seens more likely than passing 1000 params. For incrementality one point of leverage there is that Rust has explicit stability for all ABIs, meaning that the new extern "wasm-mutlivalue" fn(), when added, will be unstable and nightly-only. That means that even after landing this and adding this to Rust there'll be at least a 6 week window to shake out bugs, adjust the definition, etc, without impacting users in the wild. Basically I think it's reasonable to expect that this isn't insta-stable upon landing in LLVM.

For argument destructuring my loose intention was to resolve #88 while additionally following the conventions of native ABIs. I think a big difference between wasm and native though is that wasm has infinite "registers" and native doesn't, meaning that the destructuring just keeps happening in wasm where it eventually is guaranteed to stop on native. I do think it would be useful to pass at least some arguments destructured (e.g. a Rust &str is a pointer/len combo), but I also agree that only destructing N is a bit odd.

How about this maybe:

  • Params stay as-is in this PR -- unlimited amount and 2-field structs unconditionally destructured. In practice placing a cap on this later seems unlikely to break anyone in practice due to the general convention that things have <20 arguments.
  • Result destructuring is capped at structs of 100 fields. Should still be possible to write a C function for any reasonable wasm signature while also ensuring giant structs don't immediately become invalid wasms when returned.

@tlively
Copy link
Copy Markdown
Member

tlively commented May 29, 2026

@tlively oh thanks for pointing out those issues! Are you or others aware of more issues than those? I can try to dig into them and see if I can't figure them out

The issues from this search look relevant: https://github.com/llvm/llvm-project/issues?q=is%3Aissue%20state%3Aopen%20label%3Abackend%3AWebAssembly%20experimental-mv. Searching for "multivalue" instead of "experimental-mv" gives more results, but I don't think they're all relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants