#[repr(Interoperable_2024)]

I am unusually receptive to "we should be using WebAssembly in a lot more places" arguments, but that doesn't mean that every major Linux distribution should be replacing all their separately compiled dynamic libraries with WebAssembly modules, or that they could do so anytime soon even if they wanted to.

I absolutely think that we should learn from, among other things, WebAssembly Interface Types. Interface Types also has constraints we don't, however, such as needing to communicate between modules with no shared memory, or translate between string formats. That's true of "safe", and it's especially true of "rust-dynamic".

5 Likes

I appreciate that for some platforms the C interface may not be very clearly defined or documented. However, I think we should separate attempts to shore up target-specific documentation from attempts to standardize something new. The set of people prepared to help work on extern "safe" is not necessarily the same as the set of people who want to research and document C calling conventions for every target. I think we should define extern "safe" atop extern "C", and then separately people can improve documentation for extern "C". Many languages already know how to speak C FFI, and they can build their support for extern "safe" upon that; further documentation for the C calling convention will not help those languages support extern "safe".

I really, really don't think we should use a non-Open-Source license. Apache 2.0 helps ensure there won't be any patent concerns (which I don't think is likely regardless), and a non-Open license will only hamper adoption and contribution. (Leaving aside that a project under the banner of the Rust project should not be non-Open.)

I think we can treat that as a social concern, rather than trying to make it a legal one. Ultimately, our bigger challenge will be getting people to adopt this at all, rather than getting people to not extend it.

5 Likes

I agree, I just want to make sure that no-one tries something shady. If someone wants to create their own, completely separate spec based on something we come up with, I'm OK with that. I'm not OK with EEE (although Apache 2.0 does protect against patent nonsense).

I suspect that I'll be in the minority here, but while most people will be good, it takes just one bad actor to make life painful for everyone else.

1 Like

I'm with @ckaran on this one. I think that extern "safe" should bear a striking resemblance to the native C platform convention, but it should be defined completely on its own terms with no outsourced references to documents with lower quality standards. I am really tired of dealing with C ABI messes; C Isn't A Programming Language Anymore - Faultlore is good reading for some of the things that can go wrong here.

Of course we should make it easy for people with an existing C FFI implemetation to adapt to extern "safe" conventions, and the best way to do that would be through non-normative notes that point out places where the extern "safe" convention diverges from extern "C" and/or the C ABI as interpreted by the major compilers. (There will always have to be such notes because the major compilers can't even agree between themselves what the C ABI is.)

6 Likes

The main reason I'd like to avoid duplicating existing C ABI documentation into the documentation for extern "safe" is that if extern "safe" disagrees with the C ABI for a type that exists in the C ABI, that's a bug in extern "safe".

That said, I'm all for pointing to precise documentation (e.g. abi-aa/aapcs64.rst at main · ARM-software/abi-aa · GitHub for ARM64 and https://raw.githubusercontent.com/wiki/hjl-tools/x86-psABI/x86-64-psABI-1.0.pdf for x86-64 on UNIX), and for calling out cases of targets that don't have available documentation that precise, and for calling for the creation of such documentation. But, for instance, all the tier 1 targets have this level of documentation available.

4 Likes

I disagree. extern "safe" can look remarkably similar to the C ABI, but it doesn't have to be a superset of it. We can continue to have extern "C" as a separate thing.

Beyond that, if the C ABI on a platform continues to evolve, will that mean if safe conflicts with the updates on the C ABI front? What is the resolution process when there are two different groups working independently of one another ? After all, C and C++ are closely related, and even there C++ still isn't a proper superset of C, which can lead to weirdness that we're trying to avoid here.

ABI versioning needs to look linear; referencing a different standard that can continue to evolve independently of what we're developing will lead to issues down the road. The only standards I'd be willing to reference are ones that are forever frozen, and even then I'd be concerned about ensuring the longevity of those documents, etc.

3 Likes

Per @scottmcm's suggestion, I started an issue for this at #[repr(Interoperable_2024)] · Issue #165 · rust-lang/lang-team · GitHub.

I should clarify something: I don't necessarily expect that extern "safe" supports every type that extern "C" does, because (for instance) we could consider whether extern "safe" should support unsafe types like raw pointers. (We should consider that carefully.)

Rather, I'm saying that if there's a type that both extern "C" and extern "safe" support, it should be passed in the same way. And I'm also saying, at least for the initial versions, that we should define extern "safe" in terms of other types that currently exist in extern "C" (e.g. "slices are passed as a usize length argument followed by a pointer argument"), rather than in terms of platform registers directly in ways extern "C" can't express. That's going to be critical for getting other languages (particularly those that already have C FFI support) to support extern "safe". And it will also mean that languages without support for extern "safe" can still use C FFI to interoperate with extern "safe". Even C will be able to call and be called by extern "safe" ABIs without any special compiler support.

I can absolutely imagine ways we could efficiently pass things around that doesn't restrict itself to C compatibility; for instance, imagine an x86 ABI returning Result<T, E> by putting either T or E in rax and an error flag in a flags register. (I don't know if that'd be a win or not, it's meant as an example.) But the downside of that is a much more challenging adoption by other languages, which would have to create dedicated support for this ABI rather than just leveraging their C FFI support.

If the C ABI on a platform evolves compatibly, nothing we do is going to conflict with it. If it changes incompatibly, that arguably creates a different platform.

I'm hoping that extern "safe" influences C ABI, and may be considered an extension of it, such that C compilers eventually add native support for safe types using the same conventions that extern "safe" proposes. However, if the C ABI were to add (say) slices in a way that didn't match what extern "safe" did, I'd likely propose that the subsequent version of "safe" try to reconcile the differences to improve interoperability.

C ABI for an established target is effectively frozen; it could add things but could never change or remove things.

That's a valid concern, given that such documents have disappeared and been relocated at least once that I know of. At the very least, we should mirror copies of such documents when we can.

2 Likes

The proposal as written there seems to be solving a very different problem. It sounds like you're looking to build one data-structure layout representation that's identical across all targets, so that data can be passed cross-architecture?

That's a valuable goal, and a very different goal from the notion of interoperability between languages on the same platform. Interoperability between languages would benefit from a well-specified "native" representation that matches and extends the C layout. Interoperability between targets/platforms requires specifying a layout that's independent of endianness, type sizes, and similar. In particular, it's not clear how you'd pass a slice in memory between two platforms that disagree on the size of usize and pointers. Or is your intention to only support interoperability between targets that agree on endianness and pointer size (since those are likely to be the same between targets that share memory between each other)?

In any case, if that's your goal, I think I see why you're pushing back on the notion of matching the C ABI, since there isn't one C ABI to match. Any kind of target-portable layout would have very different design constraints than an ABI designed to work between languages. (For instance, an ABI for cross-language safe calls should ideally be as efficient as possible by matching native types, while an ABI for cross-target portability will likely need to use a layout that's non-native for at least one target.)

2 Likes

(I'm going to call extern "safe" #[repr(Interoperable_2024)] or #[repr(Interoperable_XXXX)] below because my brain keeps going to safe code, not safe layout)

I'm comfortable with this provided the method is fully and unambiguously defined as to what that method is.

I'm comfortable with this if the specification is fully and unambiguously defined.

I'm not saying that this statement will be false, but I'm cautious about this statement. Given that the ABI is unspecified in some places, I strongly suspect that compilers will need to be modified to get them to work with extern "Interoperable_XXXX". That said, I do agree with your implication that it will be easier to modify a compiler to produce extern "Interoperable_XXXX" if extern "Interoperable_XXXX" looks a lot like extern "C".

Agreed! My concern is if someone decides to 'fix' some ambiguity in the C ABI that introduces an incompatibility with safe...

If I could upvote this to infinity, I would!

I disagree here. This implies that what works in Interoperable_2024 might break in Interoperable_2027 due to how C decided to handle slices. I'm comfortable with the group that defines Interoperable_XXXX making breaking changes because if there are breaking changes, they'll have been thought about in terms of what all languages need, not just what C needs.

I agree that it can't change or remove things that are already there. However, it can also add things that are incompatible with Interoperable_XXXX causing the breaking changes discussed earlier.

Beyond that, as you mentioned earlier:

I agree with you on this point, and it's part of why I'm so interested in a new ABI. Those bad things could be completely removed, making the ABI FAR safer than the C ABI is.

Agreed! If we are going to use external documents, I'd like to see them incorporated into the main body, perhaps as appendices. This will require that we have the necessary rights to reproduce the documents, but as long as we can get those rights, mirroring/incorporating will be a good way to go[1]

The other concern I have with trying to incorporate those external documents is that we're going to be doing a lot of cherry picking to get the good parts of the C ABI. That could lead to a very disjointed and incoherent document as we have sections that say something like 'go to Appendix A.1.b.2 paragraph 3 line 5 for the definition of Foo', when we could have much more easily defined it within the main body of our own document.


  1. Ideally, we'll have a git repo with all of the documents in it, with everyone mirroring it everywhere they can. Alternatively, any of the following may be willing to host the documents in perpetuity:

    ↩︎
2 Likes

I wrote it badly then. Are you willing to take a stab at cleaning it up? I meant this:

Not this:

Yeah, I'm sorry for confusing you. I really did mean different languages on the same platform. The only (modest) change in platform would be the difference between compiling for the latest and greatest version of a platform, and one that is a few years out of date. So same target triple, but different machines.

As for what I'm pushing back on... well, here are my concerns:

  1. I want a well-defined, unambiguous spec. I am concerned that not all C ABIs are written to that standard, if they are written anywhere at all. I do not want to deal with a situation where to implement the spec means reverse engineering a particular compiler/linker/loader to see what it does.
  2. I want the right to publish our spec under our license. If we want to incorporate or mirror documents written by others, then we need the rights to do so. As far as I know[1], an ABI itself cannot be legally encumbered, but the documentation surrounding it definitely can be, so we can't just grab a copy and go.
  3. I want to get rid of the 'bad' portions of the C ABI. Just because we can pass raw pointers around in the C ABI doesn't mean it's a good idea to do so.
  4. I want to be able to update and improve whatever we produce without having to worry about the C ABI conflicting with us in some way where we're forced to break our own stability guarantees to maintain compatibility.

I am not concerned with a binary that conforms to the Windows/x86_64 ABI conventions magically running on Linux/mips64, etc. It may be possible to come up with the One ABI To Rule Them All (OABITRTA), but given how different architectures can be I think that would be a bad idea to even attempt as it would force some chips to run unnecessarily slowly just because they aren't designed with the given ABI in mind.


  1. I am not a lawyer, and nothing I say constitutes legal advice. If you want legal advice, go find a competent lawyer. ↩︎

4 Likes

So it's important to note that ABI is (at least!) two separate questions:

  • How is a type laid out in memory ("layout")?
  • How are arguments and returns passed between subroutines ("calling convention")?
  • (Do you support multiple-entry ("semicoroutines") or multiple-exit ("coroutines") directly, or only via subroutine-shaped trampolines?)

Layout (#[repr(stable-preview-1)]) is the easy part. You take your size, stride, minimum alignment, preferred alignment, and maybe other metadata (such as niches) of your fields, and write a formal algorithm to arrange fields into a record. Then you define how to translate algebraic sum types into tagged unions, and you're done.

Calling convention (extern "stable-preview-1" fn) is the hard part. This defines how you take your fancy records and arrange them in registers and on the stack when passing them / returning them by value. (As far as calling convention is concerned, passing "by-reference" is just passing a pointer by value.) Here I'd personally highly recommend either delegating to "the default target triple calling convention" (i.e. extern "system", the calling convention the OS uses, or `extern "C", the calling convention that system C uses by default) or choosing an existing calling convention and using that as the target. For its flaws, the System V is widely used and tested, and improvements to clarity developed as part of this initiative should probably be well received.

extern "stable-preview-1" doesn't have to directly use the chosen target calling convention, though. It would instead describe a lowering from the higher level types to the lower level kinds used by the calling convention specification (roughly, what kinds of registers it fits in). Perhaps going beyond this will become useful later on once the new ABI has users, but the benefit of diverging from standard register use is marginal enough (and complicated enough! everything is a trade-off for register allocation) that it needs specific work that can and should be done after a working spec on top of the existing calling convention is working.

If we settle on demanding, that any "rust-dynamic-X should be a superset of some extern "safe-Y", we could also give it two version numbers, e.g. "rust-dynamic-Y-Z" where the first version number always describes the version of the safe ABI it is superset of.

1 Like

The main issues seems to be the following case. Somebody found that the ABI as implemented by clang and as implemented by rustc result in an incompatibility for a very specific case, what should we do?

a) Consider clang's version to be the ground truth and change rusts extern "C" ABI there.

b) Read some external documentation. If the documentation uniquely specifies who does the "right" thing, change extern "C" to that (this likely means the documentation has been changed). If it is underspecified, maintain the current behavior to avoid breakage.

c) Like b) but maintain our own normative documentation.

From an FFI perspective a) is the best and c) is the worst option. However option a) has the greatest impackt on pure Rust codebases. In particular if we demand extern "safe" and extern "C" to be compatible, we might have to break our stable ABI there. If we allow extern "safe" and extern "C" to diverge, we could take different choices here. On the other hand, in practice, chances are high, that in such cases, other languages might run into problems as well.

I personally think the best way to go forward is this: The extern "safe" ABI should be fully specified. When a specification for a new version of a extern "safe" ABI is created, it should not purposely create incompatibilites to the existing C API. To ensure this, the specification may include a specific C ABI specification and define the extern "safe" ABI in terms of this C ABI, (e.g. The data type "u32" should behave like the C data type "unsigned int".). However the specification must clarify any ambiguities in the referenced C ABI. If uncertainties are discovered later, the safe ABI should be adjusted to clarify that the current behavior is correct, but the extern "C" ABI might break compatibility to extern "safe", to improve FFI interoperability.

2 Likes

Incidentally, Clang itself handles these types of corner cases with a flag, -fclang-abi-compat=VERSION (where VERSION is a version of Clang). Whenever they find a discrepancy with GCC, they fix it, but keep the fix conditional, using the flag to support people who need compatibility with existing binaries built with older versions of Clang.

Which is slightly horrifying, but what else could they do?

Most of the things it affects are C++-specific, but there are also some things on x86 involving vector types.

I wonder if there will ever be a demand for Rust to have ABI support for "C but with an old Clang ABI"...

Oh, and to make things even spicier, some targets don't adopt the fixes – typically ones where Clang is dominant like Darwin and PS4, but it's an ad-hoc list for each fix. That would imply that GCC should be the one to change behavior, but GCC has no similar flag as far as I know; I can't tell whether they're even aware of these cases.

5 Likes

I want the interoperable spec to handle both of these cases initially. This feels like a reasonable minimum viable product for 2024.

Maybe for 2027? If this is really, really easy to do, with no disagreements from any interested parties, I'm fine with it being in 2024. I'm really aiming for 2024 being 'we have a solid base, and everyone that is interested in getting on the train is on board'. For everyone involved with Rust, this is relatively easy; my concern is that we need other languages/compilers/chip vendors onboard as well, or this will become a Rust-only thing.

I actually like this approach quite a bit! Yes, it will be more complex, but if we have a standard model[1] that then gets lowered to an actual calling convention, we can add in different target triples without affecting the core document. We can then have specialist experts for particular target triples that are responsible for defining the lower documents. My vote is that start with this approach immediately, and lower to the tier 1 platforms (which, IIUC already have well-defined layouts and calling conventions, which will make 2024 a little easier to define).


  1. This feels like an abstract machine, but that term is a little loaded, and I want to make certain that it doesn't confuse anyone that's reading the spec. Yet another item on the todo list I guess... ↩︎

This is another reasonable approach, but you have more faith in humanity than I do. I have slowly come to live by the quote "Never underestimate the power of human stupidity", both because I have lived it, and because I have been the cause of it! I strongly suspect that even with good documentation, someone is going to get the numbers in "rust-dyanamic-Y-Z" backwards and cause themselves (and by extension us) headaches. And that assumes that they bothered to read the documentation in the first place.

The only reason I'm pushing for lockstep increases in version numbers is because it's reasonable to assume that most computer programmers can figure out how to compare two integers and equate safe-X with dynamic-rust-X. I have concerns with anything more complex than that.

1 Like

I hadn't thought of the possibility of users getting the numbers backwards. Do you think the risk would be significantly lower if it were "rust-dynamic-Y.Z", so that there would be "safe-1" with "rust-dynamic-1.0", ..., "rust-dynamic-1.7"; "safe-2" with "rust-dynamic-2.1"; etc.? I don't hear of users trying to use "Rust 53.1" instead of Rust 1.53, but maybe I don't get out enough. :+1: on not underestimating "human stupidity" (although perhaps "human error" might be more preferred nomenclature).

Okay then my second idea would be calling it "safe-Y-rust-dynamic-Z". But yes, this can be discussed once this ABI will be defined.

There's little point in qualifying the minor version of an ABI - following both semver and general ABI versioning rules, there is no observable difference between 1.0 and 1.1 (Except that 1.1 supports things that 1.0 didn't). I would just express this as version 1.