Should it be possible to translate (“transpile”) a Rust program to C11?


#1

Should it be possible to write a program that, given a Rust program, spits out a C11 program that does the same thing? Is it possible today?

This question affects many language points such as the memory model and how volatile reads and writes work. In particular, AFAICT, one cannot implement LLVM’s volatile reads and writes in ISO C11, because LLVM’s volatile read and write primitives cannot be expressed in C.

My own opinion is that it would be a useful experiment to build such a translator because it would be the easiest path towards an alternate Rust implementation not based on LLVM. This, in turn, would lead us to define Rust’s semantics in a more meaningful way than “whatever LLVM does” or “whatever rustc does.”

I also think that Rust should avoid needlessly diverging from C’s semantics. That is, Rust should prefer to avoid making design decisions that make translating Rust to C11 impossible. Otherwise, mixed Rust/C programs cannot be formally reasoned about, which is problematic in some cases since all Rust programs are a mix of Rust and C.

To be clear, I don’t think that Rust should be forced to do whatever C does. If there are things that Rust does that are clearly better and can’t be expressed in C, I’d rather Rust do the better thing than do the C-compatible thing. I would love to see a list of such things, if there are already such features.

Again, I bring this up because I saw a proposal that Rust’s semantics for volatile reads and writes be defined to be whatever LLVM does, which cannot be expressed (AFAICT) in C. I’ve also seen the discussion on the memory model where some people are proposing that Rust guaranteeing things that would be very expensive (AFAICT) to guarantee in a Rust program translated to C.

Note I’ve been kicking around the idea of writing a Rust-to-C translator so that I can use formal methods tools that have been (or are being) written for C to analyze Rust programs.


Solving the bootstrap problem
#2

Look on the net for explanations why targeting C is a bad idea. Unless you’re using CompCert, C is a broken target. And LLVM already targets most important CPUs (and if it doesn’t then consider writing a target for LLVM instead of a Rust->C).


#3

I am indeed targetting CompCert, and tis-interpreter.


#4

Rust is supposed not to have strict aliasing of any kind.


#5

One might be able to handle that by compiling all pointers/references to void* and all loads to memcpy. This seems like it might be a major perf hit, though (lack of alignment info etc.).


#6

I think it’s a bad idea to talk about “strict aliasing.” Instead it’s much clearer to use the C11 approach of effective types, which says (simplified) that it is undefined behavior to read from memory as though it is type T unless you previously wrote a value of type T to that memory. In Rust, one can’t even do that without using transmute or other unsafe things. Rust doesn’t define what happens when you use transmute and then modify the transmuted memory so I think that’s even stricter (AFAICT) than C’s effective type rule.


#7

As I hinted above, whether that is necessary or not depends on if/how Rust defines the semantics of modifying a transmuted value. It currently seems to be unspecified.


#8

Strict aliasing is exactly the “a read of type T must match a write of type T” restriction. Rust does not have it - mistyped reads (raw pointers and casts are enough, not transmute needed) are unspecified, not undefined.


#9

I’m not sure I understand what you mean/how it’s not answered by @arielb1’s comment. @arielb1’s comment was basically saying that C compilers are totally free to reorder lines A and B,

int* x = ...;
float* y = (float*)x;

*y = 1.0; // A
int z = *x; // B

but, in the Rust equivalent, lines C and D (probably) can’t be reordered:

let x: *const i32 = ...;
let y = x as *const f32;

*y = 1.0; // C
let z = *x; // D

In particular, this means a translation of the latter to C cannot be the former. While all of this is of course unspecified as you say, I believe the thoughts of @nikomatsakis are to drive this sort of aliasing analysis via &mut/&, rather than the types pointers point to. E.g. see his comments on https://github.com/rust-lang/rust/issues/27774. (That said, the conservative translation to void*/memcpy won’t ever be wrong, just possibly less efficient that it needs to be.)


#10

I’d like to see Rust have a better documented/specified set of semantics (independently of LLVM’s semantics), but I don’t think using C11 as the basis makes much sense. The only requirement to make C interopt work is that extern functions be considered opaque.


#11

I don’t think there’s actually much difference between casting and transmuting. It seems like casts are just syntactic sugar for transmuting one point type to another.

As far as unspecified vs. undefined behavior goes, my interest is mostly in proving the correctness of code that doesn’t have instances of either, and more generally code that avoids using unsafe at all.


#12

I think borrowing from C and C++ at least in certain areas makes sense. For example, a lot of effort has been put into C++ atomics, including making them work in C. And, it would be useful for C/C++ atomics to interoperate with Rust’s atomics so that Rust and C code can use them for building IPC mechanisms.

I already mentioned atomics as an additional thing to consider. Also, either rustc needs to understand C’s volatile in the FFI or the Rust programmer using FFI bindings needs to be very careful to use the volatile read/write functions when reading/writing to volatile variables. Actually, I think that Rust’s extern "C" means that a quite large subset of C is embedded in Rust.


#13

FYI, I am interested in working on such a transpiler.

For strict aliasing, while theoretical concerns are interesting, I think a practical answer is to compile with -fno-strict-aliasing. Given that many C programs compile with the option, performance shouldn’t suffer too much. (I believe this is better than @huon’s idea of compiling to void*.)

My understanding of CompCert Memory Model, Version 2, is that CompCert C is close to -fno-strict-aliasing C dialect.


#14

I don’t think there’s actually much difference between casting and transmuting. It seems like casts are just syntactic sugar for transmuting one point type to another.

Rust pointers are “just numbers”, and in a much stronger sense then C pointers (for example, they never become indeterminate values). Casts - that’s it, pointer-to-pointer casts - don’t affect the number.

Also, either rustc needs to understand C’s volatile in the FFI or the Rust programmer using FFI bindings needs to be very careful to use the volatile read/write functions when reading/writing to volatile variables.

Rust does not really specify volatile memory accesses (except for atomic accesses, which have a few different properties). They all must be done via a foreign ABI call, which behaves the same as any other.

I already mentioned atomics as an additional thing to consider.

And, it would be useful for C/C++ atomics to interoperate with Rust’s atomics so that Rust and C code can use them for building IPC mechanisms.

Rust indeed uses a C11-style memory model for atomic instructions. Actually, what you want in order to use Rust and C atomics together is ABI-compatibility, not memory model compatibility (e.g. Linux-style atomics with volatile and barriers can work together with C11 atomics).

Actually, I think that Rust’s extern "C" means that a quite large subset of C is embedded in Rust.

About in the same sense that a large subset of x86 assembly is embedded in Rust, or in C for that matter (after all, you can totally JIT code and call it).