Discussion: Editions in Rust-GCC (and other Rust compilers)

Please note that the foundation does not set the technical direction for the project. Anything we may or may not want in a specification is for the project members and community to decide.

10 Likes

My apologies, I misspoke.

1 Like

Note that a complete standard is really hard, and the abovementioned are corner cases. I'm not sure Rust will be able to do better than C++ in this regard, and even if yes it will be because Rust is a better designed language than C++ and not because we worked on the standard better.

1 Like

I agree that it is really hard, but I still believe that it's necessary. We've got proposals like #161 (also see ghosts) which are trying to add formal verification to rust, the older rustbelt project, and the ongoing work with Ferrocene, all of which are doing their best to make strong statements about rust code based on what rust is. The problem is that they need to make assumptions about what rust actually is for the formal verifiers to work[1]. What's more, every so often I see comments about how some behavior of rustc doesn't conform to rust the language; case in point. All of this makes me tear my hair out because I don't know what rust actually is from a formal point of view.

Up to now rust has been small enough that we've been able to get away with it. There was only one compiler, and you could pin to a particular version, so if there was ambiguity at least you had a 'standard' as defined by whatever that version of the compiler accepted and produced. But given that gccrs is working towards becoming a thing, that won't cut it any more. WE CANNOT ALLOW AMBIGUITY IN THE LANGUAGE SPEC OR IT WILL LEAD TO A FRACTURED ECOSYSTEM! I already have code that can only be compiled under certain versions of gcc, and will error out under different versions of it, and under all versions of any other compiler. This is beyond 'change your compiler flags', it boils down to fundamental differences in how each of the compiler families and versions interpret what C or C++ is. I don't want that to happen to rust. Our best shot is to figure out what rust is right now while there is only one compiler out there for it, with one more on the way. If we wait until everyone and their brother have created compilers to decide what is rust, then it will be too late, and we'll have decades of arguments on what it is. Let's not do that.


  1. Fortunately, very few assumptions, and most of those are edge cases, but those edge cases can be important and lead to pain. ↩︎

1 Like

@ckaran, I think @CohenArthur doesn't want to fragment the ecosystem either, and is taking a path that somewhat reduces the risk of doing so. While I agree that we need to take care and that we need clearer specifications in more cases, I also don't think it's helpful, in this thread, to add additional stridence to an assertion that @CohenArthur is already attempting to comply with.

There's already work going on to try to improve Rust specifications, and there's already work going on to try to establish frameworks and guardrails to avoid ecosystem fragmentation. But in this thread, I think it'd be appropriate to focus on the original question that was asked, which has effectively become "which editions are considered part of a Rust implementation", to which the answer is "all of them". We don't need a full specification of Rust to be able to start answering individual questions about what we consider to be a valid Rust implementation.

10 Likes

You're right, and I apologize. @CohenArthur, I'm sorry if I've been pushy or offended you, I've let my exceedingly strong feelings on this subject get the better of me.

3 Likes

I'll just note that the stability guarantees mean that the definition of Rust is "all observable behaviour of the stable compiler, unless explicitly specified otherwise".

What about bugs in the compiler?

I know that sounds flippant, but the reason I'm bringing it up is because that means that we can't fix known bugs that come up. #94295 that I brought up earlier is precisely that kind of issue as it was in stable rustc. It's better to have a true spec, and fix rustc when it is out of line with the spec.

1 Like

#94295 isn't a bugfix, it is a change of behaviour, and it was treated as such. It's just that the breakage risks were estimated to be tiny enough to be worth the trade for better diagnostics. However, if crates.io and github didn't exist, it is quite likely that the risks would be impossible to quantify, and the change would never be implemented.

A better example would be soundness bugs, which are explicitly allowed to be fixed with breaking changes. For example, consider #90838. It is technically a backwards incompatible change, but it breaks the soundness guarantees and so it is acceptable (this specific case was less likely to arise in practice since it was a regression between releases, but a similar bug existing since 1.0 is easy to imagine).

For an example of significant breakage due to unsoundness, you can consider #52898 which was due to a change of observable behaviour due to UB. That specific change wasn't considered breaking, since no guarantees are given about UB. The culprits (mem::zeroed and mem::uninitialized), though, were deprecated but not removed, since that would cause an unacceptable breakage in the ecosystem (actually, mem::zeroed wasn't even deprecated, since it's possible to use correctly). Maybe it will happen sometime in the future.

Regarding your specific example, I don't think it proves what you think it proves. Imho that's the kind of behaviour which would be very likely unspecified even if Rust had a 2000-page fully formal spec. It is a nice illustration that, no matter how hard you try to specify everything, stuff will inevitably slip through the cracks and cause future issues. At the same time your hypothetical fully formal spec would be so huge and complicated that few people would read it in full, and nobody would know all of it and anticipate all consequences of it.

We have numerous examples of huge specs from the older languages. C, C++, Common Lisp, Scheme, Algol, Fortran --- all of them were plagued by incompatibilities despite the existence of full specs, mitigated only by a separation of platforms and by a de-facto domination of one or a handful of compilers. In some sense the spec even encouraged incompatibilities, since it encouraged the proliferation of implementations, and incompatibilities of implementations are inevitable.

4 Likes

That's roughly the strategy we used for rust-analyzer.

For IDEs, "supporting common subset" is probably fine, though we still got a bunch of cases where people change their perfectly valid code to fit the subset of Rust that rust-analyzer understands, which is not really great.

For compilers, I am torn a bit. Overall I do think that it's important to support all editions and a single ecosystem. But there's at least an issue that rustc defaulting to 2015, and not the latest edition, is a footgun for newcomers...

1 Like

I agree. I think it's really important that we have well-greased mechanisms for making changes that theoretically could affect compatibility, while testing if they practically affect compatibility.

I agree on both points. That said, I disagree about your original statement:

This implies that the compiler is the spec, which means that if anyone wants to create their own compiler, they have to clone all behaviors of rustc, bugs and all. If we plan on doing that, then we need to decide which version of the compiler defines rust... and then immediately start rust 2.0, because once the compiler defines the spec, we can't change it without changing the spec at the same time.

A large chunk of the issues in the C spec[1] had to do with different ideas all being merged into whatever 'C' was supposed to be. Since there were so many implementations around already by the time people started really trying to hammer out the details, it led to conflicts as different vendors tried to push their ideas onto the standard. I'm trying to avoid that. I want gccrs, and any other new compiler effort to have the stability of a spec. Will it be perfect? Probably not. Is it worthwhile? In my opinion, yes.

All that said, I agree with what @josh was implying in his earlier message; that I started to move this thread off the original topic and original question, for which I apologize. If anyone wants to start a new topic of discussion on the need or desirability of a rust spec, or even if people want to message me directly, I'm up for it, but I don't want to be (further) guilty of topic hijacking.

Edit

Trying to get my markdown right.


  1. One of my old coworkers helped define ISO/IEC 9899 (can't remember which version). He had a lot of stories to tell about why this or that part of the standard was so... unusual... ↩︎

Well, there's https://rust-lang.github.io/rfcs/1122-language-semver.html#what-is-a-compiler-bug-or-soundness-change.

1 Like

Thank you, that's a PERFECT example of the kind of issue I'm talking about! Underspecified language semantics in that same RFC even lists some of the known issues with rust, some of which could lead to soundness issues. Now imagine that (at some point in the future) rustc and gccrs code can be compiled separately, but linked together. If they have different decisions on some of these underspecified portions of the language, what happens when execution crosses the boundary from one side to the other? You could decide that you're going to act like its an FFI interface, force #[repr(C)], create bindings, etc., but as an end user that would anger me no end[1], and it still wouldn't solve the problems!

In short, as painful as it will be to write, and as painful as it will be to maintain, and as even more painful it will be to read and adhere to it, we need a specification.


  1. Yes, rust doesn't have a stable ABI, there all kinds of other details, yadda, yadda, yadda. That's why we need a spec! ↩︎

A spec still wouldn't provide a stable ABI. All a spec would do is formalize exactly what guarantees rustc makes. Even with a spec, you wouldn't be able to link together #[repr(Rust)] code compiled by multiple different Rust compilers. While having a specification is beneficial for multiple things, it doesn't mean that we'd guarantee any more than we already do.

13 Likes

That's true. I was making the assumption that the spec would eventually define what #[repr(Rust)] means as well, but I never stated it. Thank you for the catch!

We can define it without spec too, and spec may not define it. I for one want it to stay undefined.

I understand why you want to do this, but I disagree. If we leave the representation unspecified, then the only way to link code compiled by different compilers is to treat it as a FFI boundary. Now, it may be that using #[repr(C)] is sufficient for all of rust's needs; I can't comment on that as I don't know enough at that level to be certain. But if it isn't sufficient, then we need to discuss and define the spec for #[repr(Rust)] so that object files produced by different compilers can all be linked together.

Why do I think this is important? Because of deeply embedded hardware like FPGAs and ASICs, where the vendor has their own proprietary C compiler that can only target their chipset, and which is slightly broken. You need it to talk to their hardware, but you don't want to use it for 95% of your code, so you do something hacky with the linker that makes things work[1] with your good compiler handling 95% of the code, and the vendor's compiler handling the code that only it can truly deal with.

Whether or not you think this is important is entirely up to you. I can understand it if most people think that this is a low priority item at best, and should be left out of the spec. In that case, if it is at all possible, I'd like it if the initial version of the spec said something like '#[repr(Rust)] is not defined by this version of the Rust specification. Compilers MUST NOT link object files generated by other compilers that use #[repr(Rust)] for external linkage. Compilers MAY link object files generated by other versions of the same compiler. Compilers SHOULD link object files generated by the same version of the same compiler'. THAT SHOULD NOT BE COPIED INTO THE SPEC LITERALLY! I'm sure that there is better wording that can make it far more clear as to what is being specified, what is expected, error reporting, etc., etc., etc. I only intended this to give a flavor of what I want.


  1. This is what happened when I was a very, very junior employee, and a very senior, very smart coworker of mine did this. To this day, I have no idea how it all worked, just that there was a chunk of the code that was off-limits to anyone but him, and that somehow the two chipsets talked to one another. ↩︎

1 Like

Alternatively to defining the repr, there could be a defined metadata section in the rmeta/rlib describing the layout, maintaining the status quo that you must have those available to link against, not just a standard archive. That way the actual layout is up to the compiler of the library.

Though, that seems a trivial thing to define compared to other metadata that must be passed between the compilers, like macros, proc-macros and generics.

6 Likes

I don't think we should specify or stabilize the default repr(Rust). I do think we should specify a repr(safe) that's a subset of that and a superset of C, allowing things like Box, Vec, slices, &str, String, and eventually boxed dyn traits.

14 Likes