pre-RFC: global source replacement in Cargo for OS bindings


#1

Problem

Rust’s libc has a big portability problem resulting from libc’s decision to use FFI for binding the system C library. It does make cross-compiling a breeze, but it also makes libc effectively own the complete set of libc bindings for every version of every operating system that will ever run Rust. This has led to bloat in the libc crate. Worse, it’s led to several situations that libc simply cannot currently support:

  • libc can’t use some of FreeBSD 12’s new features without sacrificing support for FreeBSD 11 (it’s not a problem in C thanks to ELF symbol versioning, but libc doesn’t provide any way to select which set of symbols to use).

  • libc can only support one OpenBSD version at a time, because OpenBSD has no backwards- or forwards-compatibility guarantees.

  • Adding libc support for minor forks like Bitrig requires creating a whole new target_os in Rust. That’s way too much work, especially for minor forks like pfSense that will change no more than a handful of syscalls.

  • Building Rust programs for proprietary OS forks is basically impossible. The vendor would have to, at the very least, fork libc and fork every single dependent crate to depend on the forked version.

  • Programs that try to use newly introduced library functions will compile fine on older OSes that don’t have those functions, but fail at runtime when they can’t link them. It would be better to fail at compile time.

Over in the libc issue tracker people (including myself) have proposed a number of half-baked solutions. But the root cause of the problem is that the libc crate effectively owns the c library headers for every supported operating system. I propose a different solution: a small update to Cargo that would allow OS vendors to take ownership back from rust-lang/libc.

Solution

The solution is simple: Cargo should allow OS vendors to override crates.io’s libc crate. Other crates might need to be overrideable too, for example ncurses. OS vendors should be able to distribute their modified crates through whatever channel is customarily used for that OS. For example apt-get, pkg, etc. When building, Cargo should check a global config file for any OS-provided packages, and prefer using those over crates.io’s. The global config file would look something like this:

/usr/local/etc/cargo/config:

[source]

[source.freebsd-org]
directory = "/usr/local/cargo/vendor"

[source.crates-io]
replace-with = "freebsd-org"

With such an override in place, Cargo would be able to correctly build crates on OSes that rust-lang/libc doesn’t even know about. rust-lang/libc would also be absolved of maintaining bindings for minor forks like Bitrig. For OSes with versioning practices that libc can’t handle, like FreeBSD and OpenBSD, libc could either drop support entirely or else maintain bindings for one nominal version only. Users of a different version would rely on their OS-provided libc replacement. Cross-compiling would work as before – when targeting the nominal version. When targeting any other version, the build host would need additional information pointing to the target’s vendored libc. This need not be stored in the global Cargo config file; it could be placed in the local Cargo.toml file or $HOME/.cargo/config instead.

Implementation

Implementing this RFC would require just 2-3 changes to Cargo, and zero to libc.

  • Firstly, Cargo would need to grow a global config file. The exact location would be OS-dependent and hard-coded into Cargo’s source code: /etc/cargo/config on GNU/Linux, /usr/local/etc/cargo/config on FreeBSD, etc. Cargo already knows about multiple config files, it just doesn’t look in any global locations.
  • Secondly, Cargo would need to allow modified vendored dependencies in that global config file. This could be done in either of two ways:
    • Cargo could relieve source replacement of the assumption that vendored crates must be exactly the same as crates.io crates. That’s the option assumed by my examples.
    • Or, Cargo could allow the [patch] section in the global config file. Currently Cargo only seems to search for [patch] in the local Cargo.toml file.
  • Thirdly, Cargo could optionally allow either or both of the [patch] and [source.crates-io] sections to be platform-specific. That would ease cross-compiling to OSes that use a vendored libc. The syntax would look like this:
[source.freebsd-org]
directory = "vendor/freebsd"

[source.openbsd-org]
directory = "vendor/openbsd"

[source.'cfg(target_os = "freebsd")'.crates-io]
replace-with = "freebsd-org"

[source.'cfg(target_os = "openbsd")'.crates-io]
replace-with = "openbsd-org"

#2

The problem is that we ship via rustup a libstd compiled against a particular libc version (which can be inlined into libstd, etc.).

Right now we can’t easily recompile libstd, so it would have been compiled with a different libc version than the rest of your crates in the dependency graph. That’s bad because it would mean that libstd will probably use incorrect C APIs leading to undefined behavior.

We would have to either dynamically link libstd against libc, so that it can be replaced by the user “somehow”, or be able to re-compile libstd.

If we are able to re-compile libstd easily, then a simpler alternative could be to just add a cfg(target_os_version) that either the target description file or the user can set. If the version is not known, then cfg(target_os_version = "unknown") could be true. Otherwise, some other string would be true and crates like libc could be made portable across versions.

There could be many ways to specify the version:

  • in the json target description file
  • via an environment variable, e.g., CARGO_TARGET_X86_64_UNKNOWN_FREEBSD_VERSION=11 cargo build --target=x86_64-unknown-freebsd
  • just directly after the target triple using an special delimiter cargo build --target=x86_64-unknown-freebsd.11
  • using a different flag: cargo build --target=x86_64-unknown-freebsd --target-version=11

If we could recompile libstd easily with some special flags for the OS version, I don’t think this would be a hard problem to solve. But because we currently can’t, we would have to ship them somehow with rustup, e.g., by having different target triples per version, more build jobs, etc. significantly increasing the cost of solving this problem.


#3

Also, if we were easily able to re-compile libstd, you could just patch the libc dependency in the dependency graph for all crates and for libstd, just like one does with normal crates.


#4

To be clear, is it a problem for libstd to use a different version of libc than the rest of our crates, assuming that all of the bindings are actually correct? For example, could libstd use use the FreeBSD-11 compat fstat, while other crates linked into the same application use the FreeBSD 12 fstat? If that’s not a problem, then I don’t think we’ll be any worse off than we are now.

Adding a target_os_version would still require major effort in libc and other crates. And it won’t work for minor and/or proprietary forks. For minor and proprietary forks, the only thing that will work is to take the responsibility for libc bindings away from rust-lang/libc. Of course, we needn’t worry about rustup on those OSes.

What is a target description file? Is that a rustup thing?


#5

If all of the bindings that libstd uses are correct, then it does not matter if other crates use a different libc version, as long as those bindings are correct.

The interesting case is, however, when the bindings become incorrect due to a breaking change in the platform API, targeting a slightly different fork of the platform, etc. In that case, you don’t only want the libc bindings to be correct for all crates except libstd - you want them to be correct for all crates, without buts.

For example, could libstd use use the FreeBSD-11 compat fstat , while other crates linked into the same application use the FreeBSD 12 fstat ?

That would work, but as you mention, not all platforms provide backwards compatible symbols. For this particular example, if libc were to always use the FreeBSD-11 compat fstat, and you wanted to use FreeBSD 12 fstat in all other crates, you don’t need any special feature for that. You can manually override the libc dependency with whatever crate you want using cargo patch (https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#overriding-dependencies). That is, you could override the libc dependency with libc itself but a specific cargo feature enabled (e.g. freebsd12-api).

What is a target description file?

I think these are called target specifications. I mean these files: https://github.com/rust-lang/rust/tree/master/src/librustc_target/spec and the .json files that xargo uses: https://github.com/japaric/xargo#compiling-the-sysroot-for-a-custom-target

They let you specify pretty much everything about your target, and you can create a new one and use it to recompile everything starting at libcore (using either rustc if you recompile rustc_target, or xargo if you use a json file).


#6

I don’t think it should be a requirement for rustup to work on all versions of a platform that doesn’t provide backwards-compatibility. It certainly doesn’t do that now. As I see it, rustup should target exactly one version of each platform. For FreeBSD, it should target 11 (and also work on 12). For OpenBSD it would only work on the latest version. Users who want to use Rust on older versions of OpenBSD would need to install it through the package manager instead of Rustup, just as they do now.


#7

What about users who want to cross-compile Rust from, e.g., Linux, to different versions of OpenBSD ?


#8

I think we should move in direction of being able to easily recompile std, in extreme std should be just a crate (or collection of crates) which is shipped together with compiler (with some special rules, e.g. regarding using or defining unstable items). Doing so will solve issues like this and more. IIRC plans for integrating parts of cross functionality into cargo include ability to recompile std.


#9

@gnzlbg with my proposal, users wanting to cross compile to multiple versions of OpenBSD will need to get each version’s copy of libc, put it somewhere on their build machine, and then put the paths in their $HOME/.cargo/config file as I describe in the last section of my post.

EDIT: actually that won’t work because, as you pointed out, the resulting binaries would be using an incompatible version of libstd. So yes, replacing libstd is a requirement for cross-compiling to multiple versions of OpenBSD. However, it’s not a requirement for self-compiling to multiple versions of OpenBSD, since rustc can be installed through the package manager instead of through rustup.


#10

@gnzlbg with my proposal, users wanting to cross compile to multiple versions of OpenBSD will need to get each version’s copy of libc , put it somewhere on their build machine, and then put the paths in their $HOME/.cargo/config file as I describe in the last section of my post.

@asomers how does this differ from just using [patch] in the Cargo.toml or .cargo/config to override crates in the dependency graph ? To me it appears to achieve the same, using a different syntax than what we have today.


#11

It’s quite similar. The differences are:

  1. It adds a global Cargo config file, relieving users of manually adding the OS-provided libc to their $HOME/.cargo/config.
  2. Cargo currently doesn’t look for [patch] in the .cargo/config. Right now, you have to add [patch] to each and every crate’s Cargo.toml.
  3. [patch] doesn’t allow replacing multiple versions of a dependency; only a single version. [source] allows replacing multiple versions.
  4. Cargo doesn’t currently allow platform-specific qualifiers in either [patch] or [source] sections.

#12

First, I like the proposition of @asomers.

Personally, I would differenciate the support of multiple OS or OS version provided by Rust (the facility for the user to copte with multple versions directly) and the support by rustup of multiple OS or OS versions.

if correctly documented, it is perfectly acceptable that rustup provide only a subset of what it is possible. For example for FreeBSD, to provide direct support to only FreeBSD-11 for example.

And about the support of libstd for unsupported version by rustup, I think Rust should more rely a third parties, like OS package distribution. In the same way an alternative libc source crate could be provided by the OS itself, an alternative libstd crate (in compiled form) could be provided too.


#13

@asomers @semarie I agree, I think I had a different expectation of what was the problem that this extension was supposed to solve. I tend to only cross-compile to these platforms, while you two actually use the Rust toolchains on these platforms directly, so I was a bit confused about the use case.

I like the proposal of extending source to solve this problem.

I am not sure how this would interact with the more general ability to cross-compile libstd with specific features once xargo is integrated into cargo. I think that we’d want for libc to support multiple platform versions anyways using something like cfg(target_os_version). Once that is done, platforms can ship a libstd via the package manager that’s already compiled for the appropriate OS version, and platforms should be able to specify the OS version globally somehow, so that all crates are compiled with cfg(target_os_version) properly defined. If we had that, then using [source] to override this wouldn’t be necessary AFAICT.


#14

Don’t forget about OSes and versions that rust-lang doesn’t even know about, like the proprietary ones. Fixing Rust on those platforms requires something like [source]. And if we’re going to do that anyway, then cfg(target_os_version) doesn’t provide as much additional value.


#15

Don’t forget about OSes and versions that rust-lang doesn’t even know about, like the proprietary ones. Fixing Rust on those platforms requires something like [source] .

Yes, for platforms that libc can’t know about then you want [source] to replace it with something else everywhere.

And if we’re going to do that anyway, then cfg(target_os_version) doesn’t provide as much additional value.

This is not strictly required, but supporting multiple versions of a platform in libc upstream wouldn’t be that much work if we had the tools to do that - the amount of breaking changes across platform versions isn’t typically large, and new breaking releases don’t happen that often.