Impediments to transpile Rust to C?

matthieum · August 1, 2018, 5:44pm

In the last week or so, the points has been made multiple times that Rust could never unseat C because it just was not portable enough.

This is indeed a fair point, rustc is currently bound to LLVM which supports much less platforms than C compilers do.

There are two initiatives that I know of for transpiling Rust to C:

the mrustc compiler, written in C++; pinned on Rust 1.19.0.
the LLVM to C backend, which lives on and off.

And transpiling could be a solution to get support for exotic platforms. A good transpiler to C89 or C11 (depending on the availability) would indeed vastly improve the portability of Rust.

However, what is not clear to me, is the list of impediments which could prevent from translating Rust code to equivalent C code.

I know of at least one such impediment: strict-aliasing. Rust supports casting from *mut f32 to *mut i32, modifying the integer, then using the modified float. In C, this is undefined behavior, and requires copying instead; the saving grace is that many compilers can disable strict aliasing (gcc and Clang have -fno-strict-aliasing), though it means that the produced C code is not standard-compliant.

Are there any other known issues in transpiling Rust to C?

mcy · August 1, 2018, 7:02pm

Is this something Rust wants? I thought it was my impression that a much more realistic goal was unseating C++ (at least, I usually bill Rust as "the C++ killer" to my friends).

IMO the real value of mrustc is as the bootstrap path for Rust. Bootstraping through ocaml is a pain (as my friend trying to build Coq discovered) so being able to bootstrap via clang++ is a huge plus.

Wait, is this actually allowed? I feel like this falls into one of the "sketchy transmutes are UB" rules (the rules themselves are, admittedly, somewhat too sketchy for comfort).

Also, maybe I'm a bit jaded, but I often feel like the "C standard" is de facto "what clang supports" and (to a lesser extent nowadays) "what gcc supports". "Standard by vendor consensus" seems to be commonplace nowadays.

hanna-kruppe · August 1, 2018, 8:15pm

The biggest I know of is signed integer overflow: undefined in C, but defined to wrap around in Rust. While some compilers have an option to make it wrap (gcc and clang have -fwrapv), there's nothing in standard C.

There's also a host of implementation-defined behavior that make it theoretically difficult to translate Rust code to C that you can be assured has the same behavior as the Rust code with all conforming C compilers. For example, the standard guarantees neither IEEE 754 floating point nor (at least up until C11) that signed integers are represented in two's complement. I'm unsure whether any of these are a big problem for the real targets people want to port Rust to, though.

hanna-kruppe · August 1, 2018, 8:17pm

It's explicitly and deliberately allowed. Type punning is very useful in systems programming, strict aliasing makes it quite difficult, and we have a much better source of aliasing information.

mcy · August 1, 2018, 8:19pm

Really? I thought the behavior was "panic in debug but who the hell knows on release".

Amazing!

I'm kind of surprised I thought both of these were UB... I'm becoming convinced that the list of UB in the nomicon needs some love.

comex · August 1, 2018, 9:40pm

In Rust, integer overflow for both signed and unsigned integers is defined to either (1) wrap around or (2) panic. The choice of which is implementation-defined, but it’s never UB.

Another, more pragmatic, obstacle is conditional compilation on the Rust side, e.g. with #[cfg(…)]. In many common cases, this could be translated directly to C #if/#endif, but there’s nothing stopping you from writing silly things like

#[cfg(target_pointer_width = "64")]
type Foo = String;
#[cfg(target_pointer_width = "32")]
type Foo = i32;

…where the rest of the code can have a completely different interpretation (different type inference results, different trait choices, etc.) depending on the cfg value. Real-world uses won’t be quite that silly, but it’s easy to imagine some code providing two separate implementations of some data structure depending on the pointer width, where the two implementations may be intended to have the same API but could still have subtly different behavior in the type system.

The easiest solution is to just transpile the whole crate twice, once assuming 64-bit pointers and the other assuming 32-bit, then concatenate both outputs into one C file with a giant #if/#else/#endif surrounding them. Or, if you want to be extra clever, run the two outputs through diff, and translate the diff itself to #if blocks. However, that still doesn’t result in 100% portable C code; it assumes pointers are either 64-bit or 32-bit.

And it doesn’t address other cfg keys, such as target_os. For most portable Rust crates you could probably just define target_os = "unknown" (or something) and disallow any code that makes OS-specific assumptions. But if you have portable Rust code that uses, say, libc, you can’t necessarily convert it to portable C code that uses libc. To do so, you’d need a sophisticated mechanism to allow for “constant expressions” that aren’t actually known at compile time, such as sizeof(some_libc_struct); and even that can’t work correctly with some particularly obtuse code, such as if the constant expression appears in a generic parameter and something dispatches on it using specialization.

With all that said, I’m actually quite eager to see a Rust-to-C transpiler! I think the obstacles I mentioned wouldn’t be that big a deal in practice, and packaging certain Rust crates as “single-file C libraries” could further encourage adoption. Compare to the SQLite “amalgamation” distribution, where they concatenate the entire library into one big .c file, just because it’s easier to integrate into applications that way. That’s for a project that’s originally written in C, so the alternative would “just” be adding N files into your build system rather than 1 – but they still think that’s inconvenient enough to be worth making the amalgamation. On the other hand, with a Rust transpiler, the choice would be between adding one C file or integrating a whole new compiler toolchain (rustc) into your build: the C file provides a much larger advantage.

CAD97 · August 1, 2018, 11:37pm

I think the best way to transpile Rust to C would probably to reuse the rustc frontend and transpile MIR to C. MIR is more explicit on everything (for example, I think drops and unwinding are represented explicitly?) so less would have to be done for the transpilation. Of course, MIRI already is capable of interpreting MIR.

MIR is still unstable, but I could see MIR -> C being successful so long as rustc provides a way to get at the MIR.

mcy · August 2, 2018, 1:09am

comex:

With all that said, I’m actually quite eager to see a Rust-to-C transpiler! I think the obstacles I mentioned wouldn’t be that big a deal in practice, and packaging certain Rust crates as “single-file C libraries” could further encourage adoption. Compare to the SQLite “amalgamation” distribution, where they concatenate the entire library into one big .c file, just because it’s easier to integrate into applications that way. That’s for a project that’s originally written in C, so the alternative would “just” be adding N files into your build system rather than 1 – but they still think that’s inconvenient enough to be worth making the amalgamation. On the other hand, with a Rust transpiler, the choice would be between adding one C file or integrating a whole new compiler toolchain (rustc) into your build: the C file provides a much larger advantage.

Now that is good motivation; far better than "Rust supports less targets than C" (though maybe I'm spoiled, because I get to pretend non-x86_64 targets don't exist).

Yeah I think that writing a Rust frontend at this point is about as pleasant as writing a C++ parser. I think at some point the frontend (as with other large phases of the compiler) are meant to be spun out into individual crates, iirc, with rustc just being a command-line convenience for calling into the "compile all the things" query. A stable MIR subset would be interesting, too.

josh · August 2, 2018, 2:25am

I find Rust much more reasonable as a C replacement than C++ is; neither has a runtime, but C++ has more facilities that incur runtime overhead that I can't easily compile out, and more bits written in C++ expect those facilities available.

So, I do tend to say that Rust can go anywhere C can.

comex · August 2, 2018, 2:28am

Depends. As a potentially relevant example, here's someone lamenting that thanks to Rust, it's no longer possible to maintain up-to-date Firefox on certain… rather ancient… platforms, namely OS/2, Solaris 10, and Mac OS X 10.4:

Hypothetically, if there was a well-supported Rust-to-C transpiler, Firefox could support an alternate build path using it, for the sake of weird operating systems like those. On the other hand, maintaining that build path might take more effort than just porting LLVM+Rust to target those platforms...

josh · August 2, 2018, 2:28am

Rather than fixing this by transpiling to C, why not handle it by porting LLVM to those platforms? That would provide more value for those platforms. And in practice, C exposes sufficiently many non-portable details that a Rust port by way of C would still need to know details of the target platform.

What current-generation, still-maintained platforms does LLVM not support, that people actively want to target? I know a few less common embedded platforms with support in GCC but not in LLVM, but only a few.

mcy · August 2, 2018, 2:34am

Huh, what are those facilities? I tend to approach C++ from a post-C++14 mindset, where I can essentially just write a Rust dialect that plays fast and loose with aliasing.

It is my view that supporting a modern browser on ancient platforms (or, really, using an ancient platform) which no longer receive security updates to be a security footgun. But that's like, my opinion.

comex · August 2, 2018, 2:37am

Don't forget about proprietary platforms with SDKs under NDA, such as game consoles. These days they're all using Clang/LLVM anyway, but AFAIK the code to support those platforms isn't necessarily upstream, and the vendor won't necessarily give you source code to their LLVM fork; even if they did, it might not be up-to-date, so you might not be able to build up-to-date Rust against it.

And then there are Apple platforms...

mcy · August 2, 2018, 2:46am

Aha! This is not something I'd thought about!

Well, iirc for not-macOS we're screwed because Rust is not blessed in Apple's clang, which you must use to produce anything of substance for an Apple platform that isn't rando-macos-native-app, which, AFAIK, you can already do just fine (modulo objc glue, I guess...?).

josh · August 2, 2018, 3:22am

Fair enough. If we support working with unpatched LLVM, we might be able to solve that issue with an LLVM bitcode backend, and then feeding the bitcode to the native LLVM, assuming the SDK doesn’t have an ancient version.

rpjohnst · August 2, 2018, 4:01am

Based on things like my own experimentation or the claims from the Chucklefish AMA, console SDKs aren’t anywhere near divergent enough from upstream LLVM to cause problems.

madmalik · August 2, 2018, 8:58am

A C-backend could be very nice for a certain subset of rust users.

But i don’t think its important enough to be a high priority goal for the compiler team. So (imo) it should start with MIR and maybe wait until MIR is stabilized (if MIR gets stabilized, i haven’t kept up with that).

Chasing changes in MIR or even Rust itself increases the maintenance burden and that could be a death sentence for a niche project. And a half working C backend that is glued to an old version of rust could do more harm than good.

Centril · August 2, 2018, 9:01am

Re. stabilizing MIR, I don’t think we should ever do that. By “stabilizing” MIR, I mean that we would ship a 1.0 and then never ship a 2.0. I don’t think that’s a good thing to do and will heavily constrain language design moving forward. However, I’m not opposed to eventually version MIR according to semver, as long as it is understood that breaking changes may be made when Rust needs it.

madmalik · August 2, 2018, 12:31pm

That was a poor choice of words on my part. What i actually meant is that it might be a bad idea to base a tool on a compiler implementation detail as long its not part of the compiler itself.

Treating MIR as an API that is described, versioned and with some kind of consideration for external users when changed is quite stable in my book.

Centril · August 2, 2018, 12:35pm

Giving semver guarantees about MIR is probably not something I’d advocate just now, but as rustc continues to be crate-ified such versioning could become increasingly viable in the future I suppose…

Topic		Replies	Views
Should it be possible to translate (“transpile”) a Rust program to C11? language design	14	7175	March 25, 2019
Require rustc to be bootsrapable? compiler	2	1130	March 25, 2019
Solving the bootstrap problem tools and infrastructure	11	5607	March 25, 2019
Different way to call into Rust from other languages	3	585	August 23, 2024
Add rustc flag to disable mutable no-aliasing optimizations? compiler	126	7636	April 10, 2021

Impediments to transpile Rust to C?

Related topics