Non-flat memory support for WebAssembly

When Rust was initially designed, non-flat memory models such as x86's segmentation or the bank-switching offered on many 8-bit CPUs seemed long dead, and Rust was designed around the idea of a single flat address space. However, Rust has become the flagship language of WebAssembly in recent years, and that's somewhere where there are plans to bring non-flat address spaces back from the grave, to allow fancy things such as linking two modules with no shared memory. At the moment, Rust would have a very hard time supporting a concept such as “a pointer to data in another linear memory” or “a function which resides in another module”. For this reason, I think it's important that thought be put into how Rust could be extended to support near and far pointers (I'm sorry for the Win16 and DOS flashbacks), along with near and far function calls.

4 Likes

This seems somewhat akin to this partitioned 128-bit address space proposal for RISC-V. Architectural enhancements of other current 64-bit architectures may adopt similar approaches. Thus it's a problem that Rust eventually will need to address for both small and large partitioned address spaces.

3 Likes

The only way that can be defined in WASM is through a module import, which is already supported.

Yes, but this is going to get more complicated in the future when multiple linear memories are added. That other module might have its own separate address space, so when passing parameters to it, there will need to be a way to specify which address space pointers refer to. So calling a function from another module won't always be as simple as it is now.

The current multi-memory proposal (https://github.com/WebAssembly/multi-memory) does not contain any change to the call instructions that I'm aware of. The memory index component of access instructions is an immediate in the proposal anyway.

Note that the current WebAssembly proposal, as far as I can tell, requires specifying a constant memory index for any given load or store instruction. Whereas old-fashioned far pointers were 'fat pointers' – they included the segment index within the pointer, so there was a single far pointer type that could point to any segment – WebAssembly other-memory pointers would have to be thin, with a fixed memory index encoded as part of the type.

The constant-memory-index limitation is probably a good thing. One of the main use cases for multiple memories seems to be isolating sensitive data in its own memory, to prevent it from being read even if the code is exploited. In this case, it's good to know that the attacker can't just corrupt any old fat pointer in the main memory and change it to refer to the isolated memory.

In any case, adding support to LLVM should be pretty easy, and I expect the WebAssembly folks will do that themselves: LLVM already supports multiple numbered address spaces, so it's a straightforward mapping. But what should Rust do on the frontend?

The most straightforward approach would probably be to add special-purpose intrinsics, like

fn load_memidx<const MEM_IDX: usize>(ptr: *const T) -> T

and then build abstractions on top of that using regular Rust code.

But this does have the downside that every kind of memory access, including atomic operations, volatile loads/stores, and even SIMD in the future, would need its own memory-indexed intrinsic. Also, pure Rust wrappers can't currently replicate the ergonomics of references (although that really should be fixed someday).

Another possibility might be baking memory-index support into native pointer and reference types, and perhaps making existing intrinsics generic over memory indexes. What would the syntax look like? Maybe an attribute:

fn foo(_: #[mem_idx(42)] *const u8) {}

But the attribute would have to become part of the type. Do we have any other attributes that work like that, or would it be the first one? In any case, I'm not a rustc expert, so I don't know how difficult that would be to implement. And we'd still want a way to define pure-Rust functions as generic over address spaces, if only for the sake of the intrinsic wrappers (e.g. AtomicU<n>::load). Which means we'd need to support expressions inside the attribute, not just literals:

fn foo<const MEM_IDX: usize>(_: #[mem_idx(MEM_IDX)] *const u8) {}

Again, not sure how difficult that would be to implement. Allowing expressions would also be useful for some existing attributes like #[repr(align(N))].

2 Likes

An alternative approach is: instead of having an attribute for the pointer types, we have an attribute for inner types as similar to existing layout properties i.e. size and align. The compiler already has to choose instructions for each operation depending on the compile-time "size" property; extending this layer seems natural. We can have a trait similar to Sized but maybe it is an overkill.

Calls will still work the same way, but if a pointer (just an i32 from WebAssembly's perspective) can index into more than one linear memory visible from the module, then there would need to be some sort of distinction in the type system to prevent an offset in one as being used as a pointer into another.