Moving WebAssembly support forward

Those of you following Rust’s WebAssembly support probably know that I landed a pair of PRs in the last couple weeks that added a new rustc target, wasm32-experimental-emscripten. In this post I will explain first how this target is different from the older WebAssembly rustc target, wasm32-unknown-emscripten, and second what needs to be done next to move WebAssembly support in rustc forward.

Current WebAssembly Support

Rust has supported WebAssembly for a while now through the target wasm32-unknown-emscripten. This target works by having rustc produce LLVM bitcode files from Rust source files, then passing those bitcode files to Emscripten. Emscripten links the bitcode files with libc bitcode files it has produced, compiles the resulting bitcode to WebAssembly, then adds the JavaScript glue that emulates a file system, POSIX system calls, and other functionality necessary to make the WebAssembly usable. The way Emscripten compiles the bitcode to WebAssembly is by first compiling to asm.js using the Fastcomp JSBackend, then using the asm2wasm tool to compile the asm.js to WebAssembly.

The new wasm32-experimental-emscripten target does the exact same thing as the old target, except that it tells Emscripten to use the new WebAssembly LLVM backend to compile the bitcode to WebAssembly instead of using the JSBackend and going through asm.js. This is useful because in the future the WebAssembly LLVM backend will the official, best supported, most up-to-date path from LLVM bitcode to WebAssembly. It is also a big step toward cutting Rust’s dependency on Fastcomp, which currently forces us to update Rust’s fork of LLVM in lockstep with Emscripten’s fork.

Future WebAssembly Support

Emscripten is a large dependency, and as LLVM tool support for WebAssembly matures, Emscripten will no longer be the best way to support compilation to WebAssembly. Tip of tree LLVM and lld already support emitting and linking WebAssembly object files, so my next step locally will be to create another rustc target that emits WebAssembly object files directly and invokes lld to link them. @vadimcn has been experimenting with this locally, so with luck it won’t take me too long to get this new Emscripten-free target producing #[no_std] binaries. That leaves the question of how to supply all of the runtime functionality that Emscripten provides, but the short-term plan there is to borrow as much as necessary and possible from Emscripten and figure out what is left from there.

In order to upstream these changes, a couple things will need to happen.

  1. lld will need to be added to the source tree. For the foreseeable future, whatever “cc” happens to be on the host system will not be able to link WebAssembly objects, so we need to explicitly depend on the LLVM linker. I asked @alexcrichton about this and he said we would need to do some planning and preparation for this change.

  2. Rust’s LLVM will need to be upgraded. The current one is too old to be able to emit WebAssembly object files. Unfortunately since WebAssembly support is a work in progress, Rust’s LLVM will not need to be updated just once, but potentially many times. One undesirable workaround would be to cherry pick all of the WebAssembly-specific changes to Rust’s fork, but otherwise we will have to do full updates. These will be tied to Emscripten’s LLVM updates as long as we still want to support asm.js and there is no wasm2asm tool. Hopefully @kripken can suggest a path forward for this.

How can we make this happen in an orderly fashion? Please discuss!

cc @brson @eholk

24 Likes

I didn’t think there was a notion of an object file in the wasm binary format. Has something changed ?

And if not, why would lld need to be involved at all?

Edit

Also grepping through latest git lld source code I don’t see anywhere to emit a wasm object file. This makes sense from what I understood since a linker shouldn’t need to work with wasm files, as the compiler can just emit a wasm binary module at the end, which is the only binary unit of execution/loading, etc.

Wasm modules may have references to external symbols. You may link together a module providing a symbol with a module using it, just like you’d link object files. Support for this hasn’t landed in the mainline lld yet.

2 Likes

Well sure, but for rust I don’t understand the situation where you would ever need to:

  1. emit a bunch of wasm “object” files
  2. coalesce all the symbols into a unified single wasm binary

At compile time, everything should be present statically (dynamic linking for wasm is not supported yet, so not even going to imagine what that would mean exactly), so you should be able to emit a single wasm binary as the final artifact.

I’m having trouble understanding why lld would ever need to be invoked in the rust case.

Also, even if for some reason we are emitting a bunch of wasm “object” files (also why, this just can’t be performant), the wasm binary spec isn’t that complicated; there are no relocations, etc. I’m sure even one of the rust crates for parsing a wasm binary could be repurposed/used for simple symbol coalescing, given a list of wasm files.

Also what do you mean by “mainline lld”? Is there another repo I’m unaware of (if so I’m definitely very interested in poking around at it)?

For the record, I’m just looking at the git mirror, which is updated every 5 minutes: https://github.com/llvm-mirror/lld

That's right, but you need a linker to take all the individual objects and link them into that final binary. Although this could be done with a Rust-specific tool, doing it with lld means this part of our toolchain is supported by the wider LLVM developer community, not just the Rust community. This is the same reason rustc uses the system c compiler or emscripten as linkers for its other targets.

The reasons for having wasm object files are the same as for having x86 or arm object files; they allow projects to have incremental compilation and static libraries.

@sbc100 (not affiliated with Rust) is working on lld for wasm and plans to upstream it soon. You can take a look at his repo here, but be warned it is still a work in progress.

I did not think that this is the usecase that wasm was intended for, but I could be wrong.

So, this is just technically incorrect, for a few reasons, but it's not too important to get into (and somewhat off topic.)

With regard to incremental compilation, this in particular is what I'm most perplexed by. I assumed that incremental compilation would be using llvm bitcode for the various crate dependencies of a rust library/executable, etc. This glob of ir would then be submitted to llvm for whole program optimization and importantly, potential inlining.

But If i'm understanding correctly, it sounds like you are proposing that each dependency be compiled separately into a wasm module (which is then out of scope for further LLVM optimizations and inlining); and then finally these are all submitted to the wasm "linker", which simply coalesces and dedups all these functions and symbols into a single binary (which again, I don't think will be hard at all).

This in particular seems like precisely not the usecase that wasm was designed for, namely, that all major optimizations are performed, and then the simple wasm instructions are dumped into a final, complete module. By outputting wasm modules per crate, and then "linking" them, this seems to obstruct important inlining and whole program optimization opportunities that would otherwise have been present.

However, my unfamiliarity with low level details of rust compilation might belie all of the above; I dunno.

Lastly, I can't seem to find anything on LLVM mailing lists, phabricator, or elsewhere that LLD is planning on adding a wasm backend. All I could find is this phabricator issue adding LLVM support for wasm (to emit wasm binaries, which is all that should be required): ⚙ D31099 [WebAssembly] Improve support for WebAssembly binary format

Are you sure this is on the roadmap for lld (it probably is, it's just weird I can't find anything)?

@m4b But this is the way rustc works today, isn’t it? It first compiles all the deps to rlibs (LLVM bitcode and/or native code, or in our case WASM) and then links them together to an executable. I think you’r looking for this: https://github.com/rust-lang/rust/issues/38913

@tlively What is the intended way to link JS to the WASM? Emscripten is a show stopper for me now as it can’t link JS code from a downstream crate (downstream as main crate depending on do_html_stuff crate which also contains JS code for DOM access).

Well it’s certainly not “linking any bitcode” together, unless you mean passing it to LLVM to produce a native object file, which then gets turned into an executable by the system linker, but I don’t think that’s what you meant (because that’s what I said above).

That issue is for embedding MIR into rlibs, no?

For some reason I thought when lto was passed, or in new incremental version, it was using bitcode, which was then passed to llvm; not compiling crates into static libs with native code and just linking them. Grepping through binaries in target, i’m seeing bitcode in places, as well as native code, so I dunno, looks like a mix of the two.

Everything you’ve just said equally applies to traditional ISAs. And yet, we still use separate object files and linkers. Sometimes compilation speed is more important than execution speed. Other times, the reverse is true, and that’s when you’ll turn to whole program optimization.

LLVM bitcode is not always the best format for object files. For one thing, it is not compatible between LLVM versions. For another, webassembly toolchains need not all be LLVM-based. I’m sure that gcc will soon implement a wasm backend as well. Surely, we’ll want to be able to mix toolchains and languages in the same program.

1 Like

Well there are various reasons to use object files / must use object files for traditional ISAs that aren’t an issue for wasm; but this is starting to get far afield :laughing:

So, I’m pretty sure that rust stores bitcode and HIR/MIR in .rlibs for each crate. This together with the final compilation stage should be sufficient to generate a single wasm output, (also enabling incremental compilation), as I said, without the need for a linker.

The only scenario I imagine it would be useful is in the case of wasm generated from a native .c library, and it isn’t possible to include it in the compiler IR obviously.

Eventually I would like to be able to have Rust and JS interact through Rust's FFI functionality, but there's some design work to do to determine what the JS glue code should look like. For example, there's a trade off between how much set up work the emitted code does and how flexible its interface is. This is complicated by the fact that there is no standard mechanism for dynamically linking wasm modules yet. However, some of this design work needs to be done before we even start thinking about FFI, because JS will be needed to provide even more basic functionality required by the compiler intrinsics and std. I will write up a proposal for this design (as an RFC?) once I am able to experiment with the possibilities locally.

1 Like

@m4b you can find more information about WebAssembly object files and linking here: https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md

1 Like

Interesting, thanks for the link! I think this is why I couldn't find it on webassembly.org and the spec, because it's not official and in a different repo.

So I had no idea they were adding relocations and other stuff; will be interesting to see how they mess it up and the world has another broken relocation system :wink:

This paragraph causes me some alarm though:

These conventions are not part of the WebAssembly standard, and are not required of WebAssembly-consuming implementations to execute WebAssembly code. Tools producing and working with WebAssembly in other ways also need not follow any of these conventions.

It's an interesting avenue for sure though, we'll see how it turns out.

My comments about not needing a wasm linker for most cases still stand though, I think.

Also if you do go the object file everything route, will std in the target be shipping as a wasm object file? how is this going to work with generics, also?

Awesome summary @tlively. Thank you.

I think we can start working toward distributing lld with Rust, and using it for an experimental wasm backend. This is a long-term goal for many reasons (cc @japaric). I’d suggest we name it rust-lld, distribute it with every host configuration (even when unused) and put it in /bin though I know @alexcrichton is inclined to put it in rustlib/$target/bin like we do with gcc on windows.

And we can go ahead and initiate the LLVM upgrade at any time. We have to agree with @kripken on an LLVM commit and have him do the emscripten upgrade while we start the rust LLVM upgrade. I’ll send an email to both of you to make sure he’s aware of this thread.

@tlively How close is your port to being feature complete? After thinking a bit more I’m a little concerned that it could be premature to upgrade llvm right now. I have the impression that the wasm backend and lld are not fully working yet, so it seems quite likely we will be in the position of doing another llvm upgrade in short order before users can really use the target. Do you think that would be the case?

@brson Yes, I think that would definitely be the case. I think it would be best for me to work locally and find any obvious bugs before we go through all the work of upgrading LLVM upstream. This will give @sbc100 time to upstream his work on lld as well. I think we will be in a better place to do the upgrade in about two weeks.

That being said, perhaps we can start an upgrade now, because subsequent upgrades would only be small incremental changes. That might be too much overhead, though.

I think that might be worth clarifying. Emscripten contains a bunch of things, one of which is fastcomp and the asm.js backend there. Maybe that's what you mean by "emscripten" in that sentence?

Moving Rust to use the wasm backend can avoid the fastcomp dependency (which would certainly be nice!) but I'd recommend you still use emscripten to drive the wasm backend, as it provides a lot of things (system libraries, Web API integration, etc.) that otherwise you'd need to do all yourself.

Btw, while getting rid of the fastcomp dependency is nice, getting rid of the LLVM dependency might be even nicer :wink: which is what mir2wasm does. That should be even simpler than the wasm backend (fewer and smaller dependencies), and as a bonus should easily win on compile times. Would be nice to see experimentation in both.

Back on topic, for the shorter-term issue here, if rust wants to keep support for the asm.js backend while adding wasm backend support, then we'd need to update LLVM in fastcomp. I can assist there, but don't have time to do the LLVM upgrade itself (unless this can wait a while until I do).

1 Like

Actually, it looks like there is. I don't know if quality of the emitted code is comparable to emscripten, but maybe it'll be enough to fill the gap for now?

Another thing we could do is to have a special build of rustc based on emscripten's LLVM, while the regular build uses newer LLVM and supports asm.js via wasm2asm. Unfortunately, this would put even more load on Rust's CI infrastructure.

2 Likes

Sadly that wasm2asm tool is not in a usable state. It was an experiment which ran into some issues, and we've hoped someone would have time to work on it some more, but that never happened.