Pre-RFC: WebAssembly Heap Types

Pre-RFC: WebAssembly Heap Types

History on GitHub.

(This would probably be a good candidate for an experimental feature gate[1], but I think it's probably a good idea to first discuss the design / intended direction a bit).

Background: WebAssembly Storage Locations

(Feel free to skip to the "Proposal", especially if you are somewhat familiar with Wasm, I'll try to restate the problems in terms of Rust semantics there).

Most Rust targets store all data in a single address space (with some using a separate instruction address space for function pointers). Wasm is more complicated, with several additional storage locations:

  • Linear Memory: This is what we conventionally think of as "memory" or "RAM": A sequence of bytes that can be read from or written to. This is where data on the Rust-heap and the Rust-stack is stored. Rust pointers and references are essentially integers representing an offset into linear memory. This is represented as addrspace(0) in LLVM.

  • Wasm locals (or Wasm-stack): Wasm values are either primitive numeric types, or opaque references. Wasm functions can declare (mutable) local variables that are "usable as virtual registers", and load/store Wasm values to/from them. Function parameters are considered a special case of local variable.

    Wasm values can also be pushed/popped to/from an "operand stack", to be consumed by further instructions.

    Primitive Wasm values may be loaded/stored to/from linear memory.

    It is not possible to create (Rust) pointers (or references) to Wasm locals. It is not possible to store opaque Wasm references in linear memory.

  • Wasm-heap: The Wasm-heap is entirely separate from Linear memory. The Wasm runtime manages the lifetime of objects on the Wasm-heap. Objects on the Wasm-heap can only be accessed via opaque references.

    LLVM currently has support for:

    • funcref: (Untyped, nullable) references to functions, represented as pointers in addrspace(20).

      Note that rustc currently generates function pointers in addrspace(0) (because it needs to support storing them in linear memory). They are represented by LLVM as an integer (similar to non-function references).

    • externref: (Untyped, nullable) references to objects (with an unknown layout), represented as pointers in addrspace(10).

    The Wasm spec is currently evolving to support more heap types:

    • All references to heap types will support being declared as either nullable or non-null.
    • References may be typed, i.e. they can be restricted to functions with specific signatures or specific imported extern types.
    • New heap types may be defined, by essentially describing their layout in terms of other Wasm types.
  • Tables: The fact that references to Wasm heap types cannot be stored in linear memory is quite restrictive. To work around this, Wasm has tables. Wasm references can be written to tables at specific indices, and then later read back from those indices.

    Tables are typed, they only store specific reference types. A Wasm module may define an arbitrary number of tables. For all instructions, the table to operate on is a compile time constant.

    In LLVM, a Wasm table is represented as a global variable in addrspace(1) who's type is a zero-length array of a Wasm reference type. A table may be "grown" at runtime.

Motivation

Use of Wasm reference types is desireable for higher-performance Wasm interop, and smaller generated code.

Currently, tools like wasm-bindgen post-process the Wasm generated by rustc and LLVM, adding a shim around every exported function. The shim essentially writes every reference type parameter to a table, and then passes only the index in the table to the actual function. Every time the referenced object is actually accessed, the index is translated back into the actual object. It would be nice if this wasn't necessary or could be expressed directly in Rust.

Being able to use Wasm reference types directly in Rust would also reduce the necessity of tools like wasm-bindgen, making it possible to more easily use Rust+Wasm without them.

Goals

Define a minimal API and language support to make Wasm references and Wasm tables usable in Rust, to support experimentation by the ecosystem.

Focus on supporting (untyped) externref for now.

Proposal

New std Additions

// In `mod core::ffi::wasm` (or somewhere else):

extern {
    #[lang = "wasm_extern_ty"]
    type Extern;
}

// The methods on `Clone` would never be callable, so should probably be
// implemented as `unreachable!()`.
#[derive(Copy)]
#[repr(transparent)]
pub struct ExternRef(*const Extern);

mod private {
    #[lang = "wasm_table_ty"]
    pub struct Table<T>(/* magic */);
}

pub type ExternRefTable = private::Table<ExternRef>;

impl ExternRefTable {
    pub const fn empty() -> Self { /* magic */ }
}

Usage Example

use core::ffi::wasm::*;

static MY_TABLE: ExternRefTable = ExternRefTable::empty();

extern "C" {
    #[link_name = "llvm.wasm.table.set.externref"]
    fn wasm_store_externref(t: &ExternRefTable, i: u32, v: ExternRef);
}

#[repr(transparent)] pub struct MyRef(ExternRef);

pub fn my_store(i: u32, v: MyRef) {
    unsafe { wasm_store_externref(&MY_TABLE, i, v.0) };
}

Wasm Reference Semantics

References to objects on the Wasm heap (in short: Wasm references):

  • Cannot be stored in linear memory.
    • Implication: It is impossible to create pointers or references to a Wasm reference.
  • Do not have a size or alignment.
  • Cannot be part of aggregates.
  • Do not have a bit pattern that can be read or written.
  • Can be stored on the stack and passed as function parameters.
  • Need to have a specific address space attached to them in LLVM IR.

To be very clear: All the above applies to the reference itself, not the type referred to.

There is no precedence for these kind of restrictions in Rust's type system. At least for the initial unstable implementation, I would propose to not even attempt to fit these restrictions into the type system.

The Rust compiler should do its best to catch any problematic code at compile time (even if only post-monomorphization) and report a proper error. However, it is entirely expected that at least initially the code produced by the Rust compiler will produce LLVM assertions.

The public API for Wasm references proposed here is as small as I could make it: just the ExternRef type is exposed, with the intention of allowing greater flexibility in the future when designing support for additional Wasm reference types.

My initial prototype implementation even used #[lang = "wasm_externref"] pub struct ExternRef instead of the current proposal. However, I decided to change the implementation to better prepare the compiler to support additional Wasm reference types in the future.

Wasm Table Semantics

The proposed Table design was driven by the desire to not expose the LLVM representation of tables (i.e. static [ExternRef; 0]) to Rust.

A static Table will be special-cased in the Rust compiler, to produce exactly the definition required by LLVM. The generic Table is hidden behind a type alias for now, to prevent it from being used with types other than ExternRef.

The greatest problem, for which I do not have a good solution, is that LLVM intrinsics that operate on Wasm tables require a reference to a static Table global as input that must be a compile time constant.

Rust's current const system strictly prohibits references to statics, making it impossible to use const-generics to specify the table.

For the initial unstable implementation, I propose to not enforce the "compile time constant" part in the Rust compiler. In fact, this proposal defers dealing with this to the ecosystem, by making them declare the relevant LLVM intrinsics themselves, as can be seen in the usage example above.

Tables should only used in statics. If or how this is enforced by the Rust compiler is left as a question for the future.

Future Additions

  • Generic Table API: This will be needed once Rust supports other Wasm reference types. The main open question is whether there should be any trait bounds on the generic parameter of Table<T>. (If there should be, that could be done with an unsafe trait, or a trait that is automatically implemented for all Wasm reference types).

  • Custom Extern Types: LLVM currently only supports a single ExternRef type, however the Wasm spec is evolving to allow other "extern" types to be imported. We could allow users to declare their own "extern" Wasm types, e.g. with a repr attribute on an extern type: extern { #[repr(wasm)] pub type MyExtern; }. Code like Option<NonNull<MyExtern>> would be expected to do the Right Thing™.

  • (Typed) Function References: An untyped FuncRef type could be added to Rust relatively easily (similar to ExternRef), however it is unclear what the API (e.g. to call the function) would be. Once LLVM supports them, Typed Function Reference may be interesting to support. One option could be a new fn-pointer-only ABI, e.g. extern "wasm_ref" fn(ExternRef) -> u32.

  • Wasm references in Aggregates: It would be nice if Wasm references could be stored in aggregate types (e.g. something like Result<MyWasmRef, SomeJsErrorRef>). This could be possible by not allowing references to such structs, and passing them component-wise as function parameters or return values.

    With support for custom Wasm heap types, such structs could also be created as objects on the Wasm heap.

References


  1. I don't think I currently fit the "experienced Rust contributors" definition, but I do intend to create a prototype implementation in a branch. (I have a working (though not well-tested) prototype for one of the alternative ExternRef implementations). ↩︎

8 Likes

Ok I like and agree with this

Great idea ! Do you think there could also be a type to support multiple memories ?

Something like

#[lang = "wasm_pointer_with_memory"]
struct Pointer<T, const Memory: usize>(*mut T);

Do you think there could also be a type to support multiple memories ?

I haven't really considered it. Offering an API to read/write raw bytes should be relatively straightforward, similar to my current prototype for tables / globals.

The immediate problem I see with a type like the Pointer you proposed is that all accesses would have to go through the Pointer type. Implementing Deref<Target=T> for Pointer<T> would be impossible, because how would the &T know that it references a different Wasm memory?

You'd basically have the same problem as Pin Projections and Structural Pinning, except that you'd probably be even more limited.

I've continued experimenting with this and prototyping an implementation. One question that has come up, which I don't have a good answer to, is what to do about "references to references to the WebAssembly heap".

I've written a bit about the topic in WebAssembly Heap Reference Semantics in Rust - HackMD (and also started a thread about this specific question on the t-lang zulip).

How about mixing options 2 and 3: when we have a HeapRef<T> as function parameter or self value it's represented as "fake reference", but when it's saved as a part of any data structure it goes through option 3 and instead gets saved to WASM table on creation and anytime it's acessed we load from WASM table (here we can have caching for subsequent accesses, etc) therefore we can throw all table managment (just clean up i suppose) code into drop glue.

Thanks for the suggestion and sorry for taking so long to reply.

This is something that I'm currently exploring. There are a few more things to consider (HeapRef<T> couldn't be Copy for example, because it would need to behave similarly to a reference-counted pointer). But this approach would probably best fit in with current Rust semantics.

ref counter inside of table?

externref is essentially reference counted at the wasm runtime level. And this is why it can't be put into linear memory — the runtime needs to be able to track their movement.

Tables of externref can only contain externref. Thus if the language runtime wants to reference count the table handle, that needs to be done in the linear heap, i.e. Arc<TableIndex>.

Thus Rust is best off handling externref as Box-like, despite the built-in runtime lifetime management of externref.

The way I would see it most likely working is some core::arch::wasm::externref type with the layout of usize; ABI handling for extern "wasm" which gets the actual externref value and stashes it into the table for incoming, gets it for outgoing; and a Drop implementation which removes it from the table.

1 Like

+1, this is basically what I'm primarily thinking about right now, and would be a feasible MVP, I believe.

(Potential) future extensions make things more complicated, though it's obviously entirely unclear whether those would ever be supported by Rust:

  • Nullable vs non-null references to the Wasm heap.
  • ~Arbitrary types on the Wasm heap (types function references, GC types, type imports).
  • Options for avoiding tables unless really necessary, e.g. &Extern being represented as a Wasm-heap-ref when possible, and a table index when necessary.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.