Rust as a library language


#1

Dear Rust community,

As a library author, one of my goals is to have the library be usable by many consumers. Concretely, the library should be written in a language that can be executed on many different runtimes, such as the web, the JVM, the CLR (.NET), and also on small runtimes like those of Rust, C and C++. Ideally, libraries that don’t do any IO, for example a maths library, can be compiled to any runtime.

I’m wondering whether (safe) Rust is the ideal candidate for a ‘library language’, which can compile to many runtimes and run on them directly.

I feel like a big differentiator between runtimes is in whether or not there is a GC. Targeting a GC runtime requires that the source language does not do any pointer arithmetic, which, correct me if I’m wrong, safe Rust does not, while targeting a non-GC runtime means the source language must explicitly deallocate memory, which Rust can do. These two parts give me the impression that Rust could target most runtimes. Does that make sense?

Concretely, would it be feasible to compile safe Rust to JVM byte-code? Looking at samples of safe-Rust code, this seems to map well to GC language, but looking at the Rust compiler’s MIR language, it seems quite a bit farther away. What are your thoughts?

Thanks a lot for sharing your thoughts,

Remy


#2

Safe Rust absolutely does do pointer arithmetic, e.g. you can subset a slice using indices (let subslice = &orig[start..end];).

I’m not sure whether this should interfere with a runtime which performs GC. After all, Rust doesn’t need to use that GC so AFAICT it could just entirely bypass it when compiling to such runtimes. And interoperation with other languages (specifically, accessing foreign types) is problematic and can’t be performed completely automatically anyway, so I don’t think using them from within Rust would be worse then e.g. using them from within C. (It might well require unsafe and the creation of types hiding that unsafety, though.)

I have a gut feeling that problems will start stick out in other aspects, but to be honest, I have no idea off the top of my head where they might be. (I think you just have to have experience with contributing to the very Rust compiler itself in order to be able to tell which particular implementation details might be painful to transfer to JVM or Microsoft JVM CLR.)

By the way, people have already proposed using/experimenting with alternative backends, such as Cretonne (whatever its name is this week) because LLVM has several pain points, and sticking to it generates quite a bit of technical debt in the architecture of rustc. So I assume it’s far from impossible.


#3

Do you propose something that is actually happening in the wild?

What you proposed is similar to what Graal is trying to achieve. However, compiling to different runtimes is not feasible because the standard library is not portable (or, you fully include it, which is the approach used for WebAssembly).

In most cases, making a binding is the correct approach.


#4

Slices are just views on sequences right? That seems more like ‘index arithmetic’, which is different from pointer arithmetic where you can for example read the second byte of a 32-bit integer by manipulating the pointers.

No, Rust does need to use the GC. Suppose your Rust code allocates a big array on the JVM, and then allocates and deallocates memory using that array. If the JVM code then references some memory from this array, that reference would break when the Rust code deallocates the memory. Instead, you should have the Rust code allocate using the GC, and remove the Rust deallocations.

You mean that not all runtimes share the same primitive types? I think the 4 types in WASM are supported by all runtimes, and you could use those to present other smaller types as well.

You can compile the parts of the standard library that are used and include those, as long as they don’t do native calls. This is similar to what ScalaJS does.

A binding has the disadvantage that you destroy platform portability. Suppose you’re writing a JVM application, which normally runs on any machine with a JVM, then you interface with a Rust library through the JNI, and suddenly your application only runs on machines with an architecture that the Rust library was compiled for.


#5

You seem to be confusing pointer arithmetic with type punning. A &[T] is made of a pointer to the first element and a length. If you create a subslice, it will also have a pointer into the contents of the original one, and a length. It’s not manipulating indices.

You are describing interaction with foreign types. Rust code doesn’t intrinsically need to use the GC for Rust-native types.

No, I am not talking about primitive types. These concerns are most prominent when dealing with more complex types, e.g. “objects” (or whatever a language might call them) which require memory management, dynamic dispatch, etc.


#6

While I think this would be interesting, another area where I think Rust really shines is allowing development of libraries which compile to your CPU’s native ISA (i.e. the normal rustc -> LLVM pipeline), but support safe bindings for use in other languages, e.g. ones which can automatically build safe bindings for and link to a Rust static or shared library. Or that is to say, Rust is great at embedding in other language environments.

That doesn’t sound quite what you’re after, but in that regard I think Rust is an ideal candidate for cases where you have apps developed in several languages, potentially on several platforms (different OSes, desktop, mobile) and want to have shared functionality between them.

Things like Helix and Neon come to mind. I am not aware of similarly high-level safe bindings for the JVM or CLR, though.


#7

Someone did write a wasm -> JVM bytecode compiler and then did rust -> wasm -> jvm. I don’t have the repo handy but they used the regex crate.

A fun hack, not something you’d want to do in production settings.


#8

This one I think:


#9

Rust works well with JVM via JNI, and with C# via P/Invoke. Almost all higher-level languages can call C via FFI, and Rust is callable via C FFI, so it is pretty interoperable.

Rust is a lower-level language than Java, so AFAIK there isn’t a straightforward mapping between Rust and JVM semantics (e.g. I don’t think you could use JVM’s objects and references for Rust’s memory allocation and pointers, you’d most likely need to allocate a huge byte array and pretend it’s the memory, like asm.js/WASM does).


#10

I believe what you’re thinking of is that compacting garbage collectors can move allocated objects around when a GC event occurs. If you handed out a pointer to a GC’d object and that object is moved, your pointer is now invalid. The issue in this case isn’t pointer arithmetic it’s just the pointer has been invalidated. Most languages with a compacting GC also have a way to pin an object to a fixed memory address, usually for the purposes of C FFI. For example, in C# you can use the System.Runtime.InteropServices.GCHandle.Alloc() method to pin an object.


#11

Note that slices often only contain a pointer to the interior of the object, something typically not supported by GCs.