Define a new "wasm-multivalue" calling convention#268
Conversation
This commit is an attempt to define a new calling convention for WebAssembly which I've provisionally decided to call "wasm-multivalue". This new calling convention is inspired by discussions on WebAssembly#247 and WebAssembly#88, and the primary motivation of this calling convention is to be able to use multi-return signatures for intrinsics/functions in the component model. This is a definition of a new, parallel, calling convention to the existing, now called "C", calling convention. The intention is that this avoids breakage to existing programs. The new calling convention is opt-in, requiring an annotation. The `multivalue` target feature enables usage of "wasm-multivalue", but enabling or disabling `multivalue`-the-target-feature has no effect on the "C" calling convention. The technical definition of this new "wasm-multivalue" calling convention is to alter the previous calling convention in three ways: * Primarily, `struct` returns are now returned directly if all fields are (optionally recursively) scalars. This means that returning a `struct`-by-value gets translated to multiple return values. * To account for WebAssembly#88 this new calling convention additionally passes two-scalar-argument `struct` definitions directly instead of indirectly, matching what native calling conventions do for example. * Finally, `__int128` and `long double` return values are now returned directly instead of indirectly. On an implementation side of things, what I'd like to do is to gain consensus on this calling convention definition in the tool-conventions repository as a first step. Once the definition is settled I plan to go to LLVM/Clang to implement this calling convention, and after that I plan to go to the Rust compiler and implement the calling convention there. My plan for Rust is to implement `extern "wasm-multivalue" fn(..)` syntactically, and for C to use `__attribute__((wasm_multivalue))` as the opt-in. Note that this is all bikesheddable, of course.
|
SGTM! I like that making this opt-in lets us avoid the question of how large a struct should be allowed to be returned by value. |
|
I've got a concrete implementation of this at llvm/llvm-project#200076, although I'm by no means an LLVM expert so that's unlikely its final form. It's at least the general thrust, however. |
|
This sounds good to me too. But I don't think we can/should avoid the question of picking concrete size though, right? Once this starts being used by component model intrinsics, we'll want it to be stable, no? |
|
Oh, I guess @tlively you're saying that the convention will be to just return everything byval, and just never make this the default, and then the opt-in would have to be done by the programmer or whoever defines an API? One thing that we could do: The GlobalOpt pass knows how to change functions to use the |
|
@dschuff to confirm you mean a limit to the number of return values, right? If so, I was wondering the same, but testing locally it looked like there was no limit to the number of parameters that would be generated so I figured I'd follow the same. I think it'd be reasonable to pick a limit, however, and start spilling to the stack afterwards. Implementing that scale of ABI change is probably going a bit beyond my LLVM abilities personally, but otherwise, yes, we'll want to consider this stable in relatively short-order to be able to use it in the component model. I do agree that it'd probably be worthwhile to make this the default ABI one day (or at least a variant thereof). I also agree that it'd be reasonable to switch internal functions to this ABI automatically, and this is something I plan on measuring for Rust at some point and probably applying (the precise ABI of For now though my plan is to start relatively conservative and mostly just unblock the ability to use multi-return in wasm modules (and subsequently component model intrinsics/lowerings/etc). I don't have immediate plans personally to investigate optimization passes or plan out a switch for this to become the default ABI. |
|
I think at minimum the limit would need to match the web spec’s implementation limits, no? |
|
Testing locally, currently there's no limit in LLVM to match the web 1000 parameter limit. IIRC historically LLVM can also emit more locals than the web allows, so I'm not sure how many preexisting limits are adhered to within LLVM. If a limit were to be added, though, it'd probably be something like ~400 ABI-level parmeters and ~400 field structs. Half of the 1000 limit due to some types becoming two ABI values (e.g. I'm happy to write this down, but I'm also a bit wary of diverging too much from the preexisting ABI. In practice I suspect we can retroactively apply limits at any time since 400 parameters is a lot and this isn't super widespread yet, too. |
Right. Having it be opt-in completely avoids the question of how many return values to support before spilling. I agree that it would be nice to have a default-on or FastCC multivalue ABI that only uses a reasonable, finite number of return values (2? 4? 8?), but by starting with something opt-in we can make incremental progress while continuing to defer a decision on that to the future. @alexcrichton, another consideration is that there are several known bugs with the multivalue implementation at the moment (e.g. llvm/llvm-project#98323, llvm/llvm-project#92995). The last status here was that @sunfishcode was looking into some of them. |
|
Yeah, previously we had discussed having an ABI that was meant to be similar to existing machine ABIs that can pass and return a limited number of elements in registers, for a performance boost compared to using the stack. The idea was that this could hopefully be implemented by also using registers for at least most of those parameters/return values, but that beyond a small number (probably less than 10, as Thomas mentioned), keeping parameters that would certainly need to be in memory as explicitly in memory made sense. The reason there is currently no limit (on the number of parameters used to pass struct elements, or the number of return values) is just that nobody has done the performance measurement work to figure out what the best number is. It sounds like you are looking for something a little different though, i.e. that a potentially-huge number of params or returns is what you might actually want? As Thomas says, this has the nice property that nobody has to do that performance measurement (and that performance maybe isn't really the motivation for your use case anyway?). I think I would really like us to actually just do some experiments though, and pick a small number to use as default. As I mentioned, I think it can unlock some easy performance wins for LTO and potentially as a default. We hadn't really considered the possibility of a large or unlimited number of params. It's certainly a simpler ABI, but I would want to look at the consequences (in terms of code size, generated code size, and performance) of that. I could imagine a code size blowup (pushing 100 values onto the stack at the callsites vs just passing a pointer), a performance cliff (2x copies for all those parameters? hitting some pathological case in the register allocator?); or potentially even a performance gain (having 100 values on the wasm stack instead of linear memory means no bounds checks in wasm64 mode). But we don't know. |
Do you mean the preexisting mult-value ABI? Don't worry about divergence from that, it's not used anywhere and as Thomas mentioned, it doesn't even work yet. |
|
What's the rationale for limiting the parameter aggregate split to only two scalars? I understand there should be a limit, but I imagine a higher limit, say 4, would allow for better utilization of the new ABI (the example in mind is 3D math & graphics; vec3, vec4, quat, color). |
|
If that were the limit (it isn't yet), it would be because we found empirically that it performed better. It would be interesting to know for example, how many parameters (if any?) can be passed in registers in wasm implementations and whether the builtin spilling is any worse than passing in wasm memory would be. |
|
Oh, sorry I missed that the current proposal and PR does actually use exactly 2 scalar arguments. I think what I said above about our old discussions still applies; I think it might make sense to have more than 2, based on performance and/or your usability argument. @alexcrichton may have just picked it based on the component models' needs, I would actually be curious if there was more to it than that. |
|
@tlively oh thanks for pointing out those issues! Are you or others aware of more issues than those? I can try to dig into them and see if I can't figure them out. Otherwise though, one point worth clarifying here is that my intention is to implement this ABI in both Rust and Clang at which point the definitions need to match, so that'll at least place a constraint on the degree of flexibility we have for tweaking the ABI over time. My personal intention is to unblock the ability to use multi-value in the component model where the only requirement is some way to write a function in C/Rust that maps to a specific wasm type signature, but it's all generated code so is relatively easy to adjust/modify compared to a bunch of handwritten code in the ether. Also to your points @dschuff, sorry no I don't mean to give the impression that I have a use case with tons of params/results, that was mostly in response to @fitzgen's thought that LLVM may want to match the JS embedder API limits for wasm which places a hard requirement that functions can't have more than 1000 parameters or 1000 results. Additionally by matching the previous ABI I mean the default one day (called "C" after this PR) which, from what I can tell, has no limit on the number of parameters. w.r.t. speed/optimization, I think it's going to be a bit of a tricky issue. Number-of-parameters-in-registers varies across architectures/OSes which means that a benchmark on x64 linux will probably have different results than aarch64 macos will probably have different results than x64 Windows. My main goal here is to expose the ability to use multi-value returns, or effectively be able to map any wasm function type signature to some possible C/Rust function type signature, and in that sense I'd prefer to not get too bogged down in perf investigations. At the same time though I'd predict not a whole lot of ability to tweak this ABI after-the-fact, so I'd ideally like to make sure it's "right" and takes all concerns into account.
To ask a question about this: this isn't something that can easily be retroactively modified though, could it? That would amount to changing the default ABI, which I imagine has all sorts of thorny issues because it's a "break the world" moment where nothing before can be mixed with anything after.
For this point in particular, on one hand my goal with this new ABI is to be able to map any wasm function type signature to a C/Rust type signature. If a limit were applied to parameters or results then I'd argue it should be high enough that nothing would reasonably want to use or generate such a wasm type signature, but this limit would probably be well above the threshold of "useful for performance". For example a semi-plausible situation is that in the component model the ABI for a particular imported function ends up using 16 return values. I don't think any native platform reasonably has support for 16 return registers, but from a code generation perspective it's still something that'd be desirable to be able to generate code for. This could of course inform the other way, though, by having stricter limits in the component model (e.g. at most 4 returns or something ilke that). So far we don't have a definition of what a multi-return ABI would be for the component model, but what we do have in the meantime is a set of functions that would be desirable to return up to 3 values (at least thinking about intrinsics off the top of my head). Put another way, this ABI, while I think it's an improvement over the prior, may not be best served trying to be the end-all-be-all. To me there's probably 3 desirable ABIs for WebAssembly right now:
ABIs are always a tricky thing...
Ah, yes, indeed! I meant following in the footsteps of the "C" ABI today (with no limit on params). I agree there's no need to strictly follow the preexisting multivalue abi.
Honestly not a ton of rationale. I tried some godbolt code and saw that on x86_64 the limit was 2 scalars being flattened to registers and decided to mirror that. I'm not opposed to upping the max. One caution though is that parameters to wasm functions are typically mapped to native ABIs/registers at the native-machine-code level of which there really aren't all that many. Flattening a 4-element vector would spill all other arguments to the native machine stack on x86_64 for example in Wasmtime (I think at least), meaning that wasm-level parameters are still memory-level parameters when executed. In that sense maximizing splitting or more aggressively splitting won't necessarily lead to more performance. I'd guess it'd help clean up the wasm code slightly, but the complications would mostly jsut be shifted to native.
I can speak to this from the Wasmtime side of things at least for integer arguments/results in registers:
(wow we're kind of all over the place...) That being said, for Wasmtime we have tight control over the ABI (e.g. we're not trying to match system ABIs) so we can adjust these and tweak them. One reason I'm hesitant to place too many limits in wasm is that if someone is trying to optimize for one particular engine or one particular architecture in one engine I'd want to avoid closing off possibilities by having a least-common-denominator of sorts or something like that. That's where I'd imagine that the limits for params/results (and maybe flattening?) would be relatively high to be beyond what any native architecture can reasonably implement. As I say that out loud though, on the performance part, it's probably bad to spill to both the wasm stack and then also spill to the native stack. It's probably best to spill to one or the other, and in that situation it might make the most sense to pretty aggressively use params/results in wasm to enable the most possible optimizations in engines... apologies for the lots-of-words, but I definitely want to acknowledge that this is not an easy problem nor something I was planning on landing super fast. Thank you everyone for chiming in on this, I very much appreciate it! |
This is an implementation of WebAssembly/tool-conventions#268 here in LLVM. This adds a new calling convention to Clang, named `wasm_multivalue`, which is intended to be used on WebAssembly targets to configure multiple return values and slightly tweak the ABI. Changes here are: * Parsing/validation of `__attribute__((wasm_multivalue))`. Note that validation here means that it's not only well-formed but on wasm targets the `multivalue` target feature is additionally enabled. * Clang-level ABI adjustments for the `wasm_multivalue` calling convention. These are defined by WebAssembly/tool-conventions#268 and notably includes expanding structs with exactly 2 scalar fields in parameter-position and directly returning structs with any number of scalar fields in return-position. * A new `wasm_multivaluecc` keyword/calling convention for LLVM IR. This is what Clang lowers to when using the `wasm_multivalue` calling convention. * Adjustments at the LLVM ABI layer to support returning multiple values with the `wasm_multivaluecc` calling convention. My goal after this would be to start integrating this into Rust next, under and unstable feature, and then further continue testing/vetting/etc for component model usage.
Sorry I think we may be talking about different things. I meant that (I think) in the current experimental-mv there's no limit on the number of struct elements (whereas the default C ABI doesn't pass any). We can change it now, since there are no users. But yeah obviously anything that becomes a default now or in the future can't easily be changed once it has. One thing I wasn't thinking about yet is that there currently is no limit on the number of wasm argument parameters even in the default ABI. But this is less of an issue because there is just one wasm param for each C param, and there just aren't very many C functions with 1000 parameters. So nobody has complained about hitting the implementation limit. If we start destructuring structs into wasm params (or arrays in structures?) then it becomes much more likely to be an issue. I do think it can make sense to treat the implementation incrementally though. By having an explicit opt-in less-limited (unlimited?)-destructuring ABI, we can solve a subset of the problems (plumbing new calling conventions through the layers, mixing conventions, working out LTO bugs, etc) without needing to pin down a number. We will still have to figure out things like arrays-in-structures, counting float vs int separately, and probably other stuff I'm forgetting, some of which is probably already handled in experimental-mv. Gemini tells me (haven't verified) that V8 uses 5 argument registers and 2 return registers on x86-64 and 6 arg / 2 return on ARM64. So an interestingly large difference in the number of return registers. |
|
That's a good point yeah, returning a giant (possibly generated) struct seens more likely than passing 1000 params. For incrementality one point of leverage there is that Rust has explicit stability for all ABIs, meaning that the new For argument destructuring my loose intention was to resolve #88 while additionally following the conventions of native ABIs. I think a big difference between wasm and native though is that wasm has infinite "registers" and native doesn't, meaning that the destructuring just keeps happening in wasm where it eventually is guaranteed to stop on native. I do think it would be useful to pass at least some arguments destructured (e.g. a Rust How about this maybe:
|
The issues from this search look relevant: https://github.com/llvm/llvm-project/issues?q=is%3Aissue%20state%3Aopen%20label%3Abackend%3AWebAssembly%20experimental-mv. Searching for "multivalue" instead of "experimental-mv" gives more results, but I don't think they're all relevant. |
This commit is an attempt to define a new calling convention for WebAssembly which I've provisionally decided to call "wasm-multivalue". This new calling convention is inspired by discussions on #247 and #88, and the primary motivation of this calling convention is to be able to use multi-return signatures for intrinsics/functions in the component model.
This is a definition of a new, parallel, calling convention to the existing, now called "C", calling convention. The intention is that this avoids breakage to existing programs. The new calling convention is opt-in, requiring an annotation. The
multivaluetarget feature enables usage of "wasm-multivalue", but enabling or disablingmultivalue-the-target-feature has no effect on the "C" calling convention.The technical definition of this new "wasm-multivalue" calling convention is to alter the previous calling convention in three ways:
Primarily,
structreturns are now returned directly if all fields are (optionally recursively) scalars. This means that returning astruct-by-value gets translated to multiple return values.To account for Pass small structs in parameters instead of memory #88 this new calling convention additionally passes two-scalar-argument
structdefinitions directly instead of indirectly, matching what native calling conventions do for example.Finally,
__int128andlong doublereturn values are now returned directly instead of indirectly.On an implementation side of things, what I'd like to do is to gain consensus on this calling convention definition in the tool-conventions repository as a first step. Once the definition is settled I plan to go to LLVM/Clang to implement this calling convention, and after that I plan to go to the Rust compiler and implement the calling convention there. My plan for Rust is to implement
extern "wasm-multivalue" fn(..)syntactically, and for C to use__attribute__((wasm_multivalue))as the opt-in. Note that this is all bikesheddable, of course.