Bundle zig-cc in rustup by default

Motivation

Having a C compiler is necessary for many crates, for example, https, tls, compression/decompression.

For http, even if rustls is used, it would still have to pull in ring which has a build-script to compile the C code and assembly code.

For compression/decompression, the most widely used algorithms such as zstd, libz/flate, xz are implemented in C and would require a C compiler to compile.

This means that many rust users have to install a C compiler after experiencing confusing build script errors, and it makes cross compilation a nightmare.

For users who want maximum performance with cross language LTO, this also means that they have to be careful to use the same LLVM version for the c toolchain and the rust toolchain.

And finally an external C compiler makes sandboxing much harder since build script has to invoke a potentially untrusted external command.

Proposed solution

I propose that we should distribute a c compiler via rustup by default, preferably zig-cc/cargo-zigbuild.

zig-cc has excellent cross compilation story, and cargo-zigbuild contains some workaround for zig-cc that rustup can include.

The path to the wrapped zig-cc should be exposed via a environment variable during build-script, so that cc-rs can pick it up and use it for compilation.

The zig-cc should use the same LLVM version as rustc, so that users can enable cross language LTO without pain.

If the build script is compiled down to WASI for sandboxing, then it can allow invocation of zig-cc bundled with rustup as it is trusted.

I would propose that rust should also support zig-cc also used as linker to simplify cross compilation, including to older version of glibc.

By bundling zig-cc with rustup, we could potentially eliminate build-script for certain projects, by putting the c source code in a pre-defined directory within the crate and automatically compiled when compiling the crate.

Alternative solution

Alternatively, crate owner/maintainer can use c2rust to transpile the code to rust code.

bzip2 maintainer has done this and this is working ok for them, despite some performance loss for now that could be claimed back later.

I have also tried transpiling zstd but found a few unsupported intrinsics

11 Likes

Can you please elaborate on why shipping zig-cc is better than clang? Rust has different targets and cland can be compiled to them too.

2 Likes

For example, when you try to compile -musl target on a glibc system, with clang you would have to install additional packages (header files and -I path) at least, for it to work.

Another example is compiling against older glibc version.

Suppose you are on a system with glibc 2.37 and you are targeting glibc 2.16, with clang and even cargo for now, you have to install glibc 2.16 or use container.

With zig-cc you can accomplish both of them with ease, you can cross compile to musl on gcc without installing anything, and you can build for glibc 2.16 while on a system with glibc 2.37

1 Like

I recommend to take a look at this, explaining zig-cc's cross compilation power

https://andrewkelley.me/post/zig-cc-powerful-drop-in-replacement-gcc-clang.html

1 Like

I believe that an improved cross-compilation story for Rust in general, targeting at least the 3 big desktop/laptop OSes (Linux, MacOS and Windows) and the 2 big mobile OSes (Android and iOS), as well as their associated targets (x86-64, AArch64) would be quite beneficial for the ecosystem.

Would the zig-cc compiler be able to provide all that in one fell swoop w.r.t. cross compilation of C code?

2 Likes

AFAIK, zig-cc enables cross compilation to linux targets really easy and reliable, regardless of host environment.

I have done cross compilation to aarch64/armv7 linux musl/gnu on x86_64 gnu linux, and to x86_64 musl on x86_64 gnu linux as well.

For cross compiling to macOS, I think zig-cc could work, but haven't tried that myself.

For cross compiling to windows, AFAIK zig-cc can do gnu windows target, but for msvc, cargo-xwin is needed.

A quick look at cargo-xwin, I think cross compiling to windows msvc is mostly license issue in distributing the header/files required.

From my understanding, the main values zig cc provides are (correct me if I'm wrong here):

  1. Easy building of compiler-rt.
  2. System header files, allowing you to compile C files that need those headers.
  3. System library stub files like this one, allowing you to link to system libraries.
  4. Cross-compiling linker.
  5. The integration / plumbing to make things work well.

If we are to go down this road for the Rust project, I don't think the solution is to bless cargo-zigbuild or similar, I'd be more in favour of a more partial integration of each of the benefits above:

  1. Already served by compiler-builtins.
  2. cc-rs could be the one bundling the necessary system headers (possibly behind a feature flag).
  3. rustc/rustup could be bundling/distributing library linker stubs.
  4. rustc already bundles lld, can be chosen with -Clinker=rust-lld.
  5. Will probably be hard to take real advantage of, we'll have to do a bunch of work to add support for rustc target triples etc. anyhow.

Does that make sense?

(Btw, I'd recommend all interested parties here to follow the t-lang/interop channel on Zulip)

6 Likes

For Windows we have raw-dylib now to generate import libraries. It should be possible to generate .tbd from raw-dylib for macOS too and stub ELF dylibs for Unix targets. This could then be handled in the libc crate.

That is not enough for Unix targets. While on Windows you directly invoke the linker, on Unix targets you invoke a linker driver like gcc or clang which in turn invokes the actual linker like lld. The linker driver passes a lot of platform specific flags to the actual linker like security mitigations and where to find system libraries.

That's something I didn't think of.

It'd be easier than bundling zig-cc directly, and GitHub - ziglang/glibc-abi-tool: A repository that collects glibc .abilist files for every version and a tool to combine them into one dataset. contains a bunch of glibc headers that could be useful.

Didn't find the ones for musl though.

Yes that could work very well combined with cc-rs distributing headers.

Though I still think there's value in distributing a C compiler, zig-cc or clang, because that would

  • enable any env with rustup installed to compile rust software pulling in C/C++ code
  • enable easy cross lang LTO, I believe this is important in further improving performance of rust

I still have a fantasy about having C/C++ compile down to rust mir for full program miri and more optimization, which could also be very useful.

2 Likes

Hmmm so it looks like zig-cc is still quite useful during linking, even with cc-rs bundling libc headers.

Maybe rustup could bundle zig-cc and support it like rust-lld for linking, and then pass path of zig-cc to build script as an environment variable?

2 Likes

To me this is a great idea. Zig has already done the hard work of getting all the headers and libc binaries working.

Zig-cc is open-source, so Rust doesn't have to distribute the exact zig's build. The package can be customised to share LLVM, use Rust's compiler-rt, etc.

It's not enough to have headers in the cc crate, since it still needs a C compiler, linker, and copies of libc.

Cargo doesn't automatically configure anything for cross-compilation, which makes it tricky despite excellent cross-compilation support in rustc itself. If zig's linker and sysroot was made official, hopefully Cargo could have automatic configuration for it and made simple cross-builds just work.

13 Likes

If I understand correctly, zig-cc checks if you're using any of the new features of glibc and links in any needed polyfills. This is a technically remarkable achievement and a massive ergonomic benefit! I have not yet seen any explanation of the license of this polyfill code. If the code is derived from glibc then it is LGPL-3.0 and statically linked into your binary. This would have massive knock on effects if it was GPL, but it is LGPL so probably not technically a problem. Of course, I'm not a lawyer. Talk to your lawyer before making any decisions. But at many workplaces statically linked anything that remotely touches *GPL* requires explicit conversations with a lawyer.

3 Likes

AFAIK it still is if you statically link the LGPL code. You have to provide a way to relink the executable/dylib in a way that allows swapping out the LGPL licensed code. This is trivial if the LGPL code is entirely in a cdylib, but not when statically linking.

The extra bits of glibc and compiler runtime that zig-cc links have special licensing exceptions for exactly such purpose:

https://www.gnu.org/licenses/gcc-exception-3.1-faq.html

7 Likes

LGTM. This is a good news for crates who has some c/cpp files. Zig can provide a more predictable environment.

1 Like

I am concerned that bundling a C compiler with Rust will lead to subtle ABI incompatibilities particularly on systems where the expectation is that the OS provides the C compiler. For example, both Linux and BSD distributions expect that they can accomplish system-wide changes like altering the width of time_t by changing what their C compiler does and then rebuilding everything they ship. We'd be taking on the responsibility of monitoring all supported targets for such changes.

2 Likes

We already have that problem of having to monitor all targets for ABI changes and if anything bundling import libraries for glibc would make it better. The libc crate is equivalent to header files for libc in the C world. Glibc expects that the libc header files and the libc.so version match at compile time as it makes extensive use of symbol versioning. If they say change the abi of pthread_mutex (which they have done on some archs in the past), they would bump the symbol version of pthread_lock from say GLIBC_1.X to GLIBC_1.Y and set the latter as default symbol version. If the libc crate doesn't get updated, it would contain the right type definitions for the GLIBC_1.X symbol, but because GLIBC_1.Y is the new default in libc.so, the executable would link against GLIBC_1.Y instead producing an ABI incompatibility. If we bundled both the libc crate and import libraries for glibc we would be able to update the symbol version that would be linked against at the same time as we update the libc crate to actually use the new type definition. Ideally we would also add raw-dylib support for ELF such that the libc crate itself can tell rustc the right symbol version to link against.

As for non-linking related ABI incompatibilities (like the calling convention), that is unavoidable when rustc itself is not bundled as part of the OS as rustc can't just reuse the ABI handling of the system C compiler given that the latter doesn't expose any public way for getting this information.

5 Likes

I'm having trouble making any sense of what you're proposing here because as far as I know there is no such thing as an import library for ELF shared objects. It's strictly a Windows DLL concept. On ELF-based systems, the data used by ld at link time to prepare an executable to be dynamically linked against libc.so.6 comes directly from libc.so.6 -- the same file that ld.so will use at load time.

As such, to do the thing it sounds like you want to do, the libc crate would have to bundle a complete copy of GNU libc on glibc-based Linux, a complete copy of FreeBSD libc on FreeBSD, etc. This would be a huge amount of extra work for the crate's maintainers, and none of the upstreams would be willing to help because from their perspective the only supported usage is to use the system C compiler to link against the system C library, and if the resulting binary doesn't work on systems other than the one it was linked on, too bad, we don't support that. We barely have enough developers to support anything. (I cannot possibly overstate how under-resourced glibc upstream in particular is.)

1 Like

Mach-O has .tbd yaml files which serve an identical purpose to import libraries on Windows. And for ELF we can generate dynamic libraries with the exact same symbols as the real dynamic library would have, except omit all code and data. The linker doesn't look at the code of the dynamic libraries it links against. Only at the exported symbols. As such the dynamic library rustc could generate is indistinguishable from the real dynanic library as far as the linker is concerned and thus would effectively act the same way as import libraries on Windows and .tbd files for Mach-O

2 Likes

... but that would be even more work than bundling the entire C library??? You'd have to write the tool that generates those stubs, you'd have to keep it up to date, you'd have to test it, you'd have to run the entire process every time you would have updated the bundled library...