Pre-RFC: target extension (dealing with breaking changes at OS level)

Some times ago, I opened an issue on libc about OpenBSD: the upcoming 6.2 version has a (small) breaking change at the OS level, and it makes libc to having to deal with that.

The problem is Rust language seems to be constructed with the assumption that OS will not do breaking changes (like Linux). But at least all BSD (FreeBSD, NetBSD, OpenBSD, DragonFly) do it from time to time.

The LLVM representation for BSD include an OS version in the triple representation. It aimed to represent the fact that openbsd5.4 isn’t (possibly) binary compatible with openbsd5.5.

So I started to think about extending the Target representation to include the OS version. Maybe it isn’t the right way to solve the problem. I would like to discuss about that.

As I already wrote a big part of the Motivation section for a RFC, which explain the problem in depth (I hope) and present several examples (past and future) of breaking changes at the OS level, I link it here: Target extension. It is only a WIP page and it permits to me to write down ideas and see how it could be implemented.

The purpose would be to have a stable way to represent a breaking changes between version of an OS, and so letting crate like libc to represent them.

Thanks for you feedback or your ideas on how to resolv this particular problem.

Cc: @alexcrichton

7 Likes

macOS has a concept of a target OS version too (it affects e.g. dynamic library loader), but in case of macOS this is like a min-max range (min set by -mmacosx-version-min, max set by the target/version of the SDK used).

Hi!

@semarie, as you mention it in your Target extension document, we have the same problem in FreeBSD: 64-bit inodes were just committed to HEAD which will become FreeBSD 12.0 in more than a year. Therefore, I’m particularily interested in this topic as well.

I like your proposal. Just one question: what if there are no ABI breakage in a new OS version? For instance, AwesomeBSD 5 is compatible with AwesomeBSD 4, however, AwesomeBSD 6 introduces a breakage. What about limiting the targets to $arch-unknown-awesomebsd-4 and $arch-unknown-awesomebsd-6, i.e. Rust on AwesomeBSD 5 would use the former target.

1 Like

@dumbbell, I will update the WIP Target extension document to reflect the actual state of ino64 branch. Thanks.

About what to do in absence of ABI breakage, I think it would be a complex task for Rust developers to take care of API/ABI changes in a whole OS. I would say it isn’t their job to check if an OS does a breakage or not. But maybe it could be possible to add to some magic in rustc cli in order to pick the right target ?

It would be mostly two differents cases:

  • Target not using target_os_version (and/or target_env_version) (or more exatly using an empty version): only one version, so it is expected that it works as now
  • Target using target_os_version (and/or target_env_version): and the compiler will uses it: pass an new attribute to do conditionnal compilation, and pass a complete LLVM triple to LLVM.

Note it is affects only compilation: the resulting ELF could be tried on another platform version (but it could SEGFAULT somewhere at runtime due to ABI breakage). LLVM will compiles it with accurate triple. And libc will use right attributes for conditional compilation.

For libc, operators (or predicates) able to work on version range would permit a simple approch for AwesomeBSD 5:

// operator version - I am also thinking about not modifying
// so deeply the syntax of attribute by using predicate like
// lt() and ge()
#[cfg(all(target_os="awesome", target_os_version < "6")]
stuff_in_awesome4;

#[cfg(all(target_os="awesome", target_os_version >= "6")]
stuff_in_awesome6;

it would be possible to have only one change in libc, and makes stuff_in_awesome4 to be visible in Awesome 4 and 5.

You are right, there is no need to skip versions which keep ABI compatibility. The new operators would give enough flexibility and it makes no difference for the resulting executable.

More questions/comments below:

OS vs. environment version

I don’t understand the difference between OS and environment version. Could you please clarify what they are or do you have an example of a target enviromnent version?

Triple parsing

Let’s take one of your example: x86_64-unknown-openbsd6.1. How would you parse this string to extract the OS/env version? Should we match it against /[0-9.]+$/? I believe librustc_back doesn’t need that because it has a mapping between triples and structs, but just to be sure the format is well defined.

Version comparison

Having the version in a String makes sense because it doesn’t look like a number. However, comparing those strings using a generic syntax such as < or lt() hides some kind of magic. Why should those strings be compared with a version-comparison function and other strings with a regular string-comparison function?

Therefore, should target_os_version and target_env_version be a specific OSVersion type instead?

OS vs. environment version

the distinction came from LLVM Triple, it is why I included it in the document (I assume LLVM knows better than me such things :smile:).

The OS version is lot more common than environment version.

Let take two examples for seeing the differences:

  • x86_64-unknown-openbsd6.1 : “6.1” is the version of the OS “openbsd”. there are no environment and no environment version (just empties).
  • i686-unknown-linux-musl : here for Linux, there is no OS version. But an environment exists: “musl” (it is an alternative libc library different than glibc). The environment version is also empty. But if musl decides to do breaking change some hypotetical day, it would be possible to target it with environment version.

From LLVM Triple specification, the format is ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT. With OPERATING_SYSTEM and ENVIRONMENT that could contains a version number “MAJOR” or “MAJOR.MINOR” or “MAJOR.MINOR.MICRO”.

Triple parsing

In fact, I didn’t intent to parse the LLVM triple: librustc_back doesn’t require it. But if parsing would be required, I think I would rely on LLVM code.

Currently, the target passed to rustc is statically compared to the list of buildins target, and once a target string matches, it returns a Target struct already parsed.

Here for OpenBSD 6.1 on i686, a Target with target_os_version and target_env_version:

Target {
  llvm_target: "i686-unknown-openbsd6.1".to_string(),
  target_endian: "little".to_string(),
  target_pointer_width: "32".to_string(),
  data_layout: "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128".to_string(),
  arch: "x86".to_string(),
  target_os: "openbsd".to_string(),
  target_os_version: "6.1".to_string(),
  target_env: "".to_string(),
  target_env_version: "".to_string(),
  target_vendor: "unknown".to_string(),
  linker_flavor: LinkerFlavor::Gcc,
  options: base,
}

For the problem of dealing with breaking changes, having target_os_version (and target_env_version) defined as attribute (for conditional compilation) in enough.

But having to add in librustc_back new targets at every release (and for every architecture of the OS) could be problematic. Here I dunno about the right way. Having a kind of “template” and parsing the triple could help.

Version comparison

I agree that operators would hides some magic. But to be usable as attributes (and so in conditional compilation), target_os_version and target_env_version had to be string (if I correcly understood this part – please correct me if I am wrong).

An alternative to the “magic” would be having specific predicates like version_lt(), version_le(), version_gt(), and version_ge().

Thank you for the environment version example! It's much clearer now.

In fact, I didn't intent to parse the LLVM triple: librustc_back doesn't require it.

I know it's not required in the Rust compiler. My question was more about defining the syntax. But you answered it with the LLVM triple specification link :slight_smile:

But to be usable as attributes (and so in conditional compilation), target_os_version and target_env_version had to be string (if I correcly understood this part -- please correct me if I am wrong)

Ok, I was not aware of that. If this is a requirement, then the version_*() predicates seem better to me than generic comparison operator plus some magic.

If strings are not required, then it would make sense to have a specific type. Otherwise, this doesn't look very "rusty" to me.

Does it have to be added to the triple? Can’t it be a separate option?

@kornel, I am unsure of your meaning when you are saying “triple”.

  • if it is about the LLVM triple, the OS version is already here. LLVM has the capability to target a specific OS version (and I would say that if they include it in the triple definition, it means it is useful).

  • if it is about the Rust target, having the OS version inside the struct has the same level of requirement than having the OS name (target_os attribute) inside the Target: it is used at compile time for conditional compilation. In fact, I found only one occurence of direct use for the target_os as member of Target struct instead of as attribute.

Else, despite the fact that omiting passing the option would be a compiler error, it is already possible to have some use of target_os_version by using --cfg rustc flag.

I checked again the code source. In src/librustc/session/config.rs, the default configuration is a CrateConfig, which is a HashSet<(Name, Option<Symbol>) (a Name is also a Symbol).

A Symbol is an interned or gensymed string (src/libsyntax_pos/symbol.rs).

I would be strongly in favor of having some way to tell what OS version I’m targeting, specifically for Windows. Knowing what version I am targeting means I can decide what symbols I can link to normally, and what symbols I have to dynamically load, and what fallbacks I have to implement. Since I don’t have to worry about breaking changes, it’s pretty much just a matter of what the minimum version is, so I know what functionality I can rely on and what I have to implement fallbacks for.

3 Likes

Sorry if I misunderstood the proposal, but I’m assuming it will be used as cargo --target=x86-unknown-openbsd6.1. I’m wondering why not cargo --target=x86-unknown-openbsd --min-os-version=6.1 or something like that.

This is mainly because it’s different for macOS and openbsd. For macOS it would be weird to have --target=x86_64-apple-darwin10.12.0, since target version is a range (it tries to be backwards compatible with the min version, but opts into behaviors of the max version), and --target=x86_64-apple-darwin-from-10.8.5-to-10.12.0 would make that even weirder.

the exact interface between cli-tools (rustc and cargo) and target selection is unfixed for now… I don’t have whole things in mind.

the current selection is based (for rustc) on static matching of --target value to some Target value (I expect cargo to only pass the string verbatim to rustc).

in fact, the --target value could be simplified regarding the Target. Some example:

  • --target=x86-unknown-openbsd could select some “default” Target (the latest official version: 6.1 at time of writing) - or some “automatic” selection (the running version if ran on OpenBSD)

  • --target=x86-unknown-openbsd6.1 would be available too, targeting the OpenBSD 6.1 whatever is the host.

  • --target=x86_64-apple-darwin10 would target a specific macOS major version, and Target could contains some additionnal linker argument to pass -mmacosx-version-min=xxx

  • and there is always --target=file.json to target a custom version

the exact list of versions available should be done by people involved in the OS: for OpenBSD, I know major.minor is important. But for NetBSD, I assume just major is enough for ensuring API/ABI compatibility (but it should be verified). And for MacOS I dunno :smiley:

Having an extra flag is also an valuable idea. The drawback I see is it implies to add it to any tool using already --target. So it would be better to extending the --target syntax (--target=x86_64-apple-darwin/min=10). Additionally not all platform have the concept of “minimal version” (for OpenBSD there is no guarantee that two consecutive versions will be ABI or API compatible).

It seems more simple to me to have a practical (so not exhaustive) list of builtins targets, with eventually some “generic” target to select one specific target (latest or running version).

@kornel: Could you please give more details on how this works?

On FreeBSD, an executable compiled for FreeBSD 11.0 will work on FreeBSD 12.0. For instance, an executable built on/for FreeBSD 11.0 will use the 32-bit inode compatibility ABI on FreeBSD 12.0. However, it can't use the 64-bit inode ABI even if it's available.

Therefore, the FreeBSD target version would also be a minimum. But in this case this minimum is enough: no need to specify the maximum version to support.

if I correctly understood the Status Update, the ABI compatibility isn't 100% guarantee for executable compiled for FreeBSD 11 and running on FreeBSD 12:

For instance, third-party APIs which pass struct stat around are broken in backward and forward-incompatible way. [...] Due to expansion of the basic types dev_t, ino_t and struct dirent, the impact is not limited to one part of the system, but affects:

  • kernel/userspace interface (syscalls ABI, mostly stat(2), kinfo and more)
  • libc interface (mostly related to the readdir(3), FTS(3))
  • collateral damage in other libraries that happens to use changed types in the interfaces. See, for instance, libprocstat, for which compat was provided using symbol versioning, and libutil, which shlib version was bumped.

You’re right, in some cases, an executable built for FreeBSD 11.0 won’t work on FreeBSD 12.0. But this is the exception, not the common case. ABI breakages come usually with a compatibility shim to mitigate.

My point was about the fact that on FreeBSD (unlike OS X apparently) an old executable won’t use a new feature if it’s available (except if the code explicitely handles that case). I’m curious about this possibility in OS X.

When building on macOS you set two things:

  • SDK version (“max” version). It’s mostly source-level changes, but also affects runtime. Sometimes when macOS makes backwards-incompatible changes then binaries compiled with an older SDK keep the old behavior. Recompilation with newer SDK opts-in into newer behavior. Programs are generally compiled for the very latest SDK, unless they depend on deprecated APIs.

  • Deployment target (“min” version). That’s the oldest macOS version that the program will run on. It affects code generation and linking (every .o file should be built with the same min version). Binaries will likely crash immediately if run on an OS older than the deployment target. If the target had to have only one version, this should be it.

Historically there was also a split between targeting no-GC or GC-compatible runtimes and libraries.

1 Like
  • SDK version ("max" version). It's mostly source-level changes, but also affects runtime. Sometimes when macOS makes backwards-incompatible changes then binaries compiled with an older SDK keep the old behavior. Recompilation with newer SDK opts-in into newer behavior. Programs are generally compiled for the very latest SDK, unless they depend on deprecated APIs.

Is this implemented through "universal binaries"? I.e., several executables (each one targetting a version) are bundled in the same file?

  • --target=x86-unknown-openbsd could select some "default" Target (the latest official version: 6.1 at time of writing) - or some "automatic" selection (the running version if ran on OpenBSD)

If the version is missing, I would target the oldest supported version. Backward compatibility is often possible and this would make the behavior of --target deterministic and consistent across versions of Rust (where new versions would bring more target_{os,env}_version).

No, the Mach-O fat binary structure can only distinguish between CPU architectures (x86_64 / i686 / arm64 etc).

1 Like