Impediments to transpile Rust to C?

I think producing target independent C from Rust should not be a goal of a Rust to C transpiler. At least not without removing/forbiding various features, like macros.

Edit: And even then probably still not a reasonable goal.

1 Like

I was not intending, at all, to transpile generic Rust code to generic C code, but instead to transpile Rust code for one specific target to C code appropriate for this specific target, so #[cfg], macros, or even build.rs are not issues.

So do I. But before launching such an initiative I first wanted to get an idea of the kind of obstacles that should be expected.

There are multiple issues:

  1. A LLVM backend for each and every target is a substantial effort, compared to a single generic C backend,
  2. Neither backend is a one-time cost: a LLVM backend must adapt to changes in LLVM IR, a C backend must adapt to its own input, thus volunteers must step up each time the backend must be adapted. It's unclear if the Rust community by itself is willing to take on the challenge of maintaining N LLVM backends, where as generic C backend seems well within our capacity,
  3. There are proprietary platforms where getting LLVM to work would require NDA, or even not be possible at all.

I am not expecting to see a transpile-once experience, where the transpiled code can then be used on multiple platforms. Instead, I would expect a C backend which tailors its output based on the target; so the Rust port would be expected to know all the details.

5 Likes

Rust macros? Why would that be an issue? Rust macros expand to Rust code before transpiling to C, no?

Any reason to start a new initiative and not just extend mrustc?

For now, I have NOT started any initiative, I am just enquiring about the difficulties, although it appears that many participants to this thread are getting excited and derailing the conversation quite a bit.

As for why not "just" extend mrustc, I would say that there's a complete difference in terms of goals:

  • mrustc is a full blown alternative Rust compiler,
  • a C backend is just a backend.

Maintaining mrustc requires a significant expenditure of effort, as all features of rustc must be faithfully ported. 1.19.0 is 9 versions behind, and more than a year old, attesting to the difficulty1. At this point, I am afraid that mrustc is a Tier 4 experience, and although I admire the technical challenge and appreciates the goal of providing a separate compiler implementation (Trusting Trust and all that), it does not seem to solve the problem of porting to exotic platforms in a satisfactory manner.

On the other hand, a C backend should be much simpler, feature-wise. By nature, it does not need to deal with all those front-end concerns that are compile-time computation, type inference, borrow-checking, etc... By the time it gets its input, it's been proven to conform to the language rule, and only requires the artifact to be generated.

The significantly smaller subset of tasks to be accomplished has me hopeful that such a backend (whether directly called from Rust, or called with LLVM IR) would be able to keep up with the pace of Rust development and provide a Tier 3 or even Tier 2 experience.

1 I would notably point out that mrustc has no equivalent to MIRI, AFAIK, and must duplicate all the hard work going into type inference, specialization, ...

3 Likes

Last I checked, mrustc didn’t actually perform borrow checking, and assumed the Rust code passed to it was already borrow-checked by rustc. So they aren’t duplicating all of rustc just yet. (Which really just furthers your claim that it’s a huge amount of work to do it all).

Apologies if I implied that. That wasn't my intent.

Yes, I see where you're coming from. That makes sense.

How is this issue big? One can implement wrapping signed integer arithmetic in plain C, so we would just need a support C library that provides wrapping_add and that's what we would lower the Rust code to.


@comex

And it doesn’t address other cfg keys, such as target_os .

A C support library can do this as well. Rust doesn't really do anything magical here.

Also, we could initially require transpiled C code to be compiled with gcc... -DRUST_TARGET_OS=darwin and call it a day.

But if you have portable Rust code that uses, say, libc , you can’t necessarily convert it to portable C code that uses libc. To do so, you’d need a sophisticated mechanism to allow for “constant expressions” that aren’t actually known at compile time, such as sizeof(some_libc_struct)

I don't understand this point. libc could be converted to C code (its just extern functions), and if the values of the cfg() macros are know it should generate a C wrapper that is 100% ABI-identical with that of the platform; everything else is a bug in libc which is why this is actually tested by the ctest crate on every libc commit.


@josh

Rather than fixing this by transpiling to C, why not handle it by porting LLVM to those platforms?

Because porting LLVM to all platforms that every C compiler was ever able to target is a huge amount of work.


@matthieum

Translating Rust or MIR to C is going to probably run into very similar problems than translating LLVM-IR to C did.

IIRC the main problem is that MIR, LLVM-IR, etc. are all target dependent, which means that if the Rust toolchain generating the MIR or LLVM-IR does not support a particular target (or generates them for a slightly different target), the generated C code won't probably compile with the C compiler that the end user wants to use.

I think this means that we cannot generate C from MIR because macros are expanded well before that. We would have to generate C directly from Rust, expanding Rust macros to C macros and maybe even expanding every "portable" operation to calls into a call into a C support library.

We don't even have to provide this C support library, we can just transpile to C and expect the user to provide it for the platform they want to use. Such support libraries will probably be huge.

Rust macros are more powerful than C macros so I do not see how this could work in general?

I do not think converting Rust to C that can then be built on any arbitrary target will ever work without cutting down what can be converted, even C is target dependent, it exposes knowledge of the platform it is being compiled for via pre-processor macros.

You would also likely have to forbid procedural macros and custom derives if you want to produce arbitrarily portable C

Of course everything can be emulated, but it's a question of overhead. Addition and subtraction is easy assuming two's completement (just do it on the equivalent unsigned type), but AFAIK multiplication and left shifts will need complex workarounds which will lower to multiple instructions unless you start using C-compiler-specific built-ins or get lucky that the C compiler recognizes that the complex C expression you've chosen implements signed two's complement mul/shift and lowers it to a single instruction.

1 Like

I don't think there's much choice.

Yes, it does mean that some support would need to be available for the particular target whether in rustc or LLVM, but there is no reason not be able to provide the "specification" at run-time via a configuration file or flags, so that anyone can provide the "specification" of their platform.

On the other hand, translating generic Rust to C will only get harder and harder as Rust gains more compile-time capabilities. For example, any ability to reason about the size of a struct and take compile-time decision based on that (such as the number of such structs you can fit) requires the rustc compiler to be aware of the final struct layout for the target. It's unavoidable.

1 Like

Aren't a lot of those targets just missing platform support in std and the underlying libs rather than missing architectural support in LLVM?

LLVM does not support OS/2 yet, and it is not the only missing platform either, I think AVR is still in the works for example.

Ah, I thought OS/2 being on x86 meant that LLVM codegen/no_std would work for it already, but apparently it doesn’t?

A platform is more than just the CPU it runs on. Each OS also imposes its own system API to access the underlying hardware and its own C ABI, for example.

For example, the “target” is different between linux, darwin, and windows, even when all run on x86.

1 Like

The desire to have tail-call elimination in Rust eventually.

I am not sure if TCE would be as much an issue.

While TCE is transparent to the user, there is no TCE in assembly, so at some point in the pipeline someone needs to transform the call to eliminate it. In both the cases of transpiling to C and of compiling to assembly, this transformation needs to happen before writing down the final artifact.

It might require a bit of smarts in the transpiler if the transformation was not already done on its input, but there should be sufficient literature/example, with so many languages doing it, so I’d expect it not to be too difficult… though I may be wrong of course.

TCE is basically transforming a call, e.g.

call foo
ret

into a jump:

jmp foo

The former keeps some space on the stack in use by the caller function (in this case, just the return address); the latter doesn’t.

C supports function calls, but doesn’t have a primitive that’s guaranteed to translate to a jump to a function. So you can’t do ‘true’ tail calls in portable C. You can simulate them with a trampoline function, though (at some cost to performance).

1 Like

There are various transformations which can be applied.

For example, functional languages tend to use recursion extensively instead of loops, and there TCE is necessary to avoid blowing up the stack. If you recognize the pattern, you can simply transform the chain of recursive calls into a traditional for/while loop.

So, there is a mildly expensive way of guaranteeing TCE (trampoline), and there are better performing (but more specialized) ways of obtaining it, so I don’t think it should be too much of an issue, although it may not be as straightforward a translation.

Am I missing instances of TCE that would be fully impossible to translate to ISO C?

2 Likes

I expect that the most problematic part would be runtime implementation - specifically, panicking, various intrinsics, and allocation. All those seem to me to be platform-dependent quite a lot and the C language doesn’t provide abstractions to cover this.

This is especially true for many of those more obscure platforms - a compilation to C per se doesn’t actually help very much here because you still need to figure out how to implement Rust-specific features (such as panicking, etc.) on each of those platforms.

Edit: I suppose you could implement rudimentary panicking using setjmp, However (1) it seems hacky and prone to violate soundness, and (2), again, especially on more unusual platforms it might not be available or entail various platform-specific quirks in behaviour.

1 Like