Pre-RFC: WebAssembly Heap Types
(This would probably be a good candidate for an experimental feature gate[1], but I think it's probably a good idea to first discuss the design / intended direction a bit).
Background: WebAssembly Storage Locations
(Feel free to skip to the "Proposal", especially if you are somewhat familiar with Wasm, I'll try to restate the problems in terms of Rust semantics there).
Most Rust targets store all data in a single address space (with some using a separate instruction address space for function pointers). Wasm is more complicated, with several additional storage locations:
-
Linear Memory: This is what we conventionally think of as "memory" or "RAM": A sequence of bytes that can be read from or written to. This is where data on the Rust-heap and the Rust-stack is stored. Rust pointers and references are essentially integers representing an offset into linear memory. This is represented as
addrspace(0)
in LLVM. -
Wasm locals (or Wasm-stack): Wasm values are either primitive numeric types, or opaque references. Wasm functions can declare (mutable) local variables that are "usable as virtual registers", and load/store Wasm values to/from them. Function parameters are considered a special case of local variable.
Wasm values can also be pushed/popped to/from an "operand stack", to be consumed by further instructions.
Primitive Wasm values may be loaded/stored to/from linear memory.
It is not possible to create (Rust) pointers (or references) to Wasm locals. It is not possible to store opaque Wasm references in linear memory.
-
Wasm-heap: The Wasm-heap is entirely separate from Linear memory. The Wasm runtime manages the lifetime of objects on the Wasm-heap. Objects on the Wasm-heap can only be accessed via opaque references.
LLVM currently has support for:
-
funcref
: (Untyped, nullable) references to functions, represented as pointers inaddrspace(20)
.Note that
rustc
currently generates function pointers inaddrspace(0)
(because it needs to support storing them in linear memory). They are represented by LLVM as an integer (similar to non-function references). -
externref
: (Untyped, nullable) references to objects (with an unknown layout), represented as pointers inaddrspace(10)
.
The Wasm spec is currently evolving to support more heap types:
- All references to heap types will support being declared as either nullable or non-null.
- References may be typed, i.e. they can be restricted to functions with specific signatures or specific imported extern types.
- New heap types may be defined, by essentially describing their layout in terms of other Wasm types.
-
-
Tables: The fact that references to Wasm heap types cannot be stored in linear memory is quite restrictive. To work around this, Wasm has tables. Wasm references can be written to tables at specific indices, and then later read back from those indices.
Tables are typed, they only store specific reference types. A Wasm module may define an arbitrary number of tables. For all instructions, the table to operate on is a compile time constant.
In LLVM, a Wasm table is represented as a global variable in
addrspace(1)
who's type is a zero-length array of a Wasm reference type. A table may be "grown" at runtime.
Motivation
Use of Wasm reference types is desireable for higher-performance Wasm interop, and smaller generated code.
Currently, tools like wasm-bindgen
post-process the Wasm generated by rustc
and LLVM, adding a shim around every exported function. The shim essentially
writes every reference type parameter to a table, and then passes only the index
in the table to the actual function. Every time the referenced object is
actually accessed, the index is translated back into the actual object. It would
be nice if this wasn't necessary or could be expressed directly in Rust.
Being able to use Wasm reference types directly in Rust would also reduce the
necessity of tools like wasm-bindgen
, making it possible to more easily use
Rust+Wasm without them.
Goals
Define a minimal API and language support to make Wasm references and Wasm tables usable in Rust, to support experimentation by the ecosystem.
Focus on supporting (untyped) externref
for now.
Proposal
New std
Additions
// In `mod core::ffi::wasm` (or somewhere else):
extern {
#[lang = "wasm_extern_ty"]
type Extern;
}
// The methods on `Clone` would never be callable, so should probably be
// implemented as `unreachable!()`.
#[derive(Copy)]
#[repr(transparent)]
pub struct ExternRef(*const Extern);
mod private {
#[lang = "wasm_table_ty"]
pub struct Table<T>(/* magic */);
}
pub type ExternRefTable = private::Table<ExternRef>;
impl ExternRefTable {
pub const fn empty() -> Self { /* magic */ }
}
Usage Example
use core::ffi::wasm::*;
static MY_TABLE: ExternRefTable = ExternRefTable::empty();
extern "C" {
#[link_name = "llvm.wasm.table.set.externref"]
fn wasm_store_externref(t: &ExternRefTable, i: u32, v: ExternRef);
}
#[repr(transparent)] pub struct MyRef(ExternRef);
pub fn my_store(i: u32, v: MyRef) {
unsafe { wasm_store_externref(&MY_TABLE, i, v.0) };
}
Wasm Reference Semantics
References to objects on the Wasm heap (in short: Wasm references):
- Cannot be stored in linear memory.
- Implication: It is impossible to create pointers or references to a Wasm reference.
- Do not have a size or alignment.
- Cannot be part of aggregates.
- Do not have a bit pattern that can be read or written.
- Can be stored on the stack and passed as function parameters.
- Need to have a specific address space attached to them in LLVM IR.
To be very clear: All the above applies to the reference itself, not the type referred to.
There is no precedence for these kind of restrictions in Rust's type system. At least for the initial unstable implementation, I would propose to not even attempt to fit these restrictions into the type system.
The Rust compiler should do its best to catch any problematic code at compile time (even if only post-monomorphization) and report a proper error. However, it is entirely expected that at least initially the code produced by the Rust compiler will produce LLVM assertions.
The public API for Wasm references proposed here is as small as I could make it:
just the ExternRef
type is exposed, with the intention of allowing greater
flexibility in the future when designing support for additional Wasm reference
types.
My initial prototype implementation even used
#[lang = "wasm_externref"] pub struct ExternRef
instead of the current
proposal. However, I decided to change the implementation to better prepare the
compiler to support additional Wasm reference types in the future.
Wasm Table Semantics
The proposed Table
design was driven by the desire to not expose the LLVM
representation of tables (i.e. static [ExternRef; 0]
) to Rust.
A static Table
will be special-cased in the Rust compiler, to produce exactly
the definition required by LLVM. The generic Table
is hidden behind a type
alias for now, to prevent it from being used with types other than ExternRef
.
The greatest problem, for which I do not have a good solution, is that LLVM
intrinsics that operate on Wasm tables require a reference to a static Table
global as input that must be a compile time constant.
Rust's current const
system strictly prohibits references to static
s, making
it impossible to use const
-generics to specify the table.
For the initial unstable implementation, I propose to not enforce the "compile time constant" part in the Rust compiler. In fact, this proposal defers dealing with this to the ecosystem, by making them declare the relevant LLVM intrinsics themselves, as can be seen in the usage example above.
Table
s should only used in statics. If or how this is enforced by the Rust
compiler is left as a question for the future.
Future Additions
-
Generic
Table
API: This will be needed once Rust supports other Wasm reference types. The main open question is whether there should be any trait bounds on the generic parameter ofTable<T>
. (If there should be, that could be done with anunsafe trait
, or a trait that is automatically implemented for all Wasm reference types). -
Custom Extern Types: LLVM currently only supports a single
ExternRef
type, however the Wasm spec is evolving to allow other "extern" types to be imported. We could allow users to declare their own "extern" Wasm types, e.g. with arepr
attribute on anextern type
:extern { #[repr(wasm)] pub type MyExtern; }
. Code likeOption<NonNull<MyExtern>>
would be expected to do the Right Thing™. -
(Typed) Function References: An untyped
FuncRef
type could be added to Rust relatively easily (similar toExternRef
), however it is unclear what the API (e.g. to call the function) would be. Once LLVM supports them, Typed Function Reference may be interesting to support. One option could be a newfn
-pointer-only ABI, e.g.extern "wasm_ref" fn(ExternRef) -> u32
. -
Wasm references in Aggregates: It would be nice if Wasm references could be stored in aggregate types (e.g. something like
Result<MyWasmRef, SomeJsErrorRef>
). This could be possible by not allowing references to such structs, and passing them component-wise as function parameters or return values.With support for custom Wasm heap types, such structs could also be created as objects on the Wasm heap.
References
- D122215: [WebAssembly] Initial support for reference type externref in clang
- D139010: [clang][WebAssembly] Implement support for table types and builtins
- Clang WebAssembly Extensions
- WebAssembly Spec
- WebAssembly GC Proposal MVP Docs
- LLVM WebAssembly CodeGen Test
wasm-bindgen
: Support for Reference Types- Rust Issue: Support for WebAssembly externref in non-web environment
I don't think I currently fit the "experienced Rust contributors" definition, but I do intend to create a prototype implementation in a branch. (I have a working (though not well-tested) prototype for one of the alternative
ExternRef
implementations). ↩︎