Experience report contributing to rust-lang/rust

Based on everything said here I think what would be the most immediately actionable improvement for me (and anyone else who mainly makes small contributions to std) would be an easy and documented way to build std and run tidy with one of the toolchains that can be installed through rustup. That would make the barrier to contributing pretty similar to any other large rust project.

13 Likes

As someone who has had to, for many years across multiple projects, languages, and industries, dive into and maintain others' software, written over multiple years with no consistent formatting or style, I could not disagree more.

My take is, unless you are the only person who will EVER work on the code base, then auto-formatting should be used. The specifics of the style don't matter all that much, but, consistency does.

17 Likes

Moderation note: Let's please not let this thread de-rail into a general argument about source formatting. Running rustfmt has already been acknowledged, personal opinions aside. I think that's enough for this particular topic.

11 Likes

This seems unfortunate to me and imposes unnecessary limitations. I also don't see this as applicable long-term.

Ideally, I'd like to see three separate projects similar to what @matklad described:

  1. The compiler - rustc
  2. The Standard library - this includes the interface with the Rust users. In other words, all the traits and vocabulary types that define a common language for the users.
  3. Runtime - this includes the interface with the underlying environment - the OS, the hardware, etc.

Each part above deals with separate concerns. Such as separation would expand supported use cases, such as:

  1. Specialised / hobby OSes and HWs - without forcing them to fit-in into the mainstream supported platforms.
  2. Competing implementations just like C/C++ have. allows to support different design tradeoffs without splitting the community. (I reckon Rust, the language, might need to enhance a few of its facilities to handle this gracefully). Ideally, I should be able to change a run-time while remaining compatible and not get affected by coherence.

Finally, this would allow us to better formalize the interface points between the separate parts and have better defined boundaries. This in turn means that there would not be any git sub-modules. Instead, the compiler would have an explicit dependency on a released version of std in its cargo.toml.

Because "std-aware cargo" is a thing that's happening and that everyone seems to agree we want, I figured it was a given that std would move to another repo once it literally was "just another crate". I assumed that was out of scope for this thread and the real "should we monorepo?" question intended here instead breaks down to:

  • does it make any sense to split away std in the short/medium-term, before "std-aware cargo" is fully functional? (presumably without turning it into yet another submodule)
  • can the runtime be put in a separate repo from the compiler? (without yet another submodule)
  • same question for other components like the lexer, parser, backtraces, chalk, polonius, etc.

As someone who doesn't currently contribute to Rust, I have no idea which of these are even possible to do, much less desirable. Perhaps it just boils down to the ongoing "librarification" of the compiler, and there's no special shortcut available at the moment.

This sounds like a really awesome setup. If there is agreement on this plan, I can probably contribute some time to make it happen.

I wonder if we can then also do lighter-weight CI on the runtime parts.

4 Likes

I think this could definitely be improved:

$ # use a tmpfs and let's ignore disk IO for now
$ cd /tmp

$ # baseline
$ time git clone https://github.com/rust-lang/rust.git
Cloning into 'rust'...
remote: Enumerating objects: 1154546, done.
remote: Total 1154546 (delta 0), reused 0 (delta 0), pack-reused 1154546
Receiving objects: 100% (1154546/1154546), 490.76 MiB | 22.38 MiB/s, done.
Resolving deltas: 100% (935929/935929), done.

1m 15s

$ # clean up
$ rm -rf rust

$ # shallow clone
$ time git clone --depth=1 https://github.com/rust-lang/rust.git
Cloning into 'rust'...
remote: Enumerating objects: 22424, done.
remote: Counting objects: 100% (22424/22424), done.
remote: Compressing objects: 100% (21033/21033), done.
remote: Total 22424 (delta 1579), reused 6515 (delta 914), pack-reused 0
Receiving objects: 100% (22424/22424), 12.79 MiB | 3.89 MiB/s, done.
Resolving deltas: 100% (1579/1579), done.

8s

This could probably also be extended to the submodule handling in src/bootstrap/bootstrap.py. Does the build system rely on having all commits available (e.g. for revision counting)?

2 Likes

Unfortunately, this isn't quite true. Because std/alloc/core are special in that they can (and do) implement language items, which are a permanently unstable API between the compiler and std (and friends) that can and do change.

Intrinsics (i.e. "call some functionality the compiler provides) is probably the most tricky and variable lang item group, but even the lang items that allow e.g. implementing on primitive types can and do change.

You know how sometimes nightly doesn't have rustfmt/clippy because they haven't been updated to match refactors yet? If std (and friends) were a completely separate repository from the compiler, this would happen for std as well for every change to lang items.

3 Likes

To quote myself from above:

Having lang items is precisely the point I've been making. We already have an informal and unstable interface between these constituent parts because these are separate pieces of code that need to interact with each other in well defined ways.

As an aside, the lang items related to memory allocation would hopefully go away once the global/system allocator APIs are flashed out. I'd consider them mostly part of the runtime whereas abstractions such as Box that rely on them are part of the std component. If I develop a new OS and want to support Rust, I would need to implement the runtime component for my new OS but the std component providing the Box abstraction should work out of the box (excuse the pun..)

1 Like

Sure we can say "make lang items a “more formal” API surface," but that's much easier said than done. In a world where std is in a separate VCS world from the compiler, how would you orchestrate a cooperative change to the lang item interface or the intrinsics interface?

I'd say it's not a viable system if you have a broken state required because someone decided it's a better idea to have the size_of intrinsic return Option<usize> rather than usize (say, so that the intrinsic returns None for extern types and the library handles that edge case directly).

In the current, single-repo setup, that's just putting the #[cfg(bootstrap)] for the library in the same PR that does the language change. With std in a separate repo, you're requiring either std or rustc to push a commit to master that doesn't work until the other pushes the other half of the work. How are you going to get that tested through bors?

Putting core/alloc/std in a separate VCS history from the compiler makes that disconnect, and effectively means freezing the lang item / intrinsic interface. That's not a tractable decision that can be made.

The solution isn't to bifurcate the VCS tracking of the two intimately connected pieces. The better solution is to make testing the "std world" not require building the rest of the tree, but still VCS track them in sync for those #[cfg(not(bootstrap))] changes that are required.


(Side note: does std itself – libstd, not liballoc/libcore/stdarch, actually require any lang items or intrinsics, or are those confined to the liballoc and lower levels?)

2 Likes

Off-hand I know that std defines lang_start and f32/f64_runtime. ETA: also the panic runtime.

The moderator has already asked that we not turn this into a discussion on the merits of autoformatting. I know my opinion is in the minority, I have heard the argument before, and I am willing to run rustfmt to contribute to the rust repository.

6 Likes

How does anyone orchestrate an API change between two pieces of software? /sarcasm

You simply stated that change is not tractable without providing any reasoning as to why. The current design doesn't mean we couldn't have perhaps a slightly different future design to integrate between the separate pieces. This is not rocket science. We do have multiple creates in our ecosystem and we use semver to solve exactly this issue of dependencies. Why would the compiler be any different? It is a piece of software just like any other piece of software. And as others have mentioned, rustc is already in the process of being separated into libraries.

Other languages (e.g. C and C++) do manage to separate the compiler from the runtime and that opens up useful use cases that I have mentioned above. In fact, Rust relies on that design choice and allows the user to link to an alternative C runtime that is statically linked instead of glibc.

If Rust wants to succeed as a general purpose systems language it would need to support such use cases.

It's a matter of priorities. As other people have mentioned, there's a lot we can do to improve the experience of contributing to std (the original topic of this thread) without splitting things into separate repos. While formalizing the runtime interface is an admirable goal, it's also a lot of work, and we don't want to block other improvements on it in the mean time.

1 Like

The work on splitting e.g., librustc and libsyntax into several other crates has little to do with actually wanting it to become a set of reusable components and to be moved out of the compiler. Rather, this is primarily about exploiting pipelining to get more parallel builds as well as getting better incremental recompilation for the compiler itself. That is, make it easier to hack on the compiler. Moving things out of the repo, or using semantic versioning, would be quite counter-productive to that. For rustc, I think a "live at HEAD" approach is preferable.

1 Like

I think we've gotten the bors queue under control again. We had an unfortunate CI outage for a few days that we had to work through, as well an insufficient number of rollups, but it should be all good now.

EDIT: the bors queue is empty now, this is your opportunity, use it well. :stuck_out_tongue:

I was not referring specifically to that. Counter example - chulk as a potentially future traits solver is a separate project.

I disagree wholeheartedly that this is counter-productive, I've provided motivation why indeed it is essential for certain use cases to have such a separation. Systems programming (Rust's domain) has a lot to do with greater user control. Case in point, as a user I should be able to have control on various trade-offs made in the language's runtime.

There was a desire long ago to implement std without any reliance on libc and instead use syscalls directly (on linux). This could have been simply implemented as an alternative but compatible rust runtime implementation. This removes the very high barrier to entry in the current implementation that chose a different trade-off. Rust should be able to support this without bifurcating the ecosystem.

I totally agree that it is a matter of priorities and I do not disagree that this longer term vision is indeed quite a bit of work. As I said, libary-ification of the project is already a goal as stated by Niko, which is already being worked on. I just suggested what the final outcome of that could be and why it is beneficial to Rust to have this goal in mind.

(It's relevant to the experience wrt. hacking on rust-lang/rust, but perhaps the topic should be split for the discussion of poly-repo, etc.)

I don't think this is a counter-example. Chalk is still an experimental trait solver that is not used by rustc so it can iterate much more quickly out of tree. Once it does become the trait solver, I think it will become problematic if it stays out of tree as that will I think separate it from the integration tests we run on CI as well as making reviews deal with larger and less incremental changes; it will also mean having to jump through hoops when making changes to several parts of the compiler.

I understand that this is a priority to you, but greater control for hobby OSes and facilitating alternative implementations have costs. And to me, it's not a priority to facilitate alternative implementations & such (especially when there's not a formal specification of the language -- so there would be issues with stability guarantees) when it is to the detriment of day to day hacking on Rust, and based on my experience working on the compiler, I'd say that it would be just that because I now have to jump between more repositories to check the incoming PRs and to make changes myself.

I think there is disagreement re. what "library-ifcation" means. Niko has indeed expressed the view that we should e.g., move to a poly-repo setup, trying to move more of the compiler to stable Rust, etc. but that is controversial (as there are great aspects about the current setup that we'd lose), and is not something that has been decided and adopted as a project goal.

At the end of the day, the rust developers would set the priorities and goal for the project and I do understand the resuorces we have aren't unlimited.

I'm just trying to suggest that it is worthwhile to consider not only how our current design informs our decisions but also how our decisions can inform how we evolve our design based on user needs.

I agree that it would be more difficult to hack on the compiler with the current unstable interfaces and the current design & setup. I can't speak for Niko but it stands to reason that his desire to move to a poly-repo setup with stable rust aligns with the latter statement above. It would produce a better, more efficient design that would cater to more use-cases, would provide more guaranties to the users (stable, well defined interfaces) that could be reused in other tooling and would result in a better experience overall when working on the compiler.

I'm all for designing good boundaries in rustc's internal APIs, and in fact, I'm working quite a bit on that myself. That doesn't have much to do however with making those boundaries stable and using semver as a way to give versioning guarantees. All that would do is to make iteration and improvements to the compiler way more cumbersome, as we need to make what would amount to breaking changes several times per week, and that would either make semver meaningless, or it would slow down our development. I also don't accept that moving into a polyrepo or semver would improve the designs in the slightest.

Meanwhile, using stable Rust internally in the compiler would mean that we couldn't dogfood new language features internally in the compiler. That would notably reduce our chances to gain better test coverage, risking more bugs in those features, and it would deprive us of seeing how well the features are designed in the compiler as well, so shipping them would take more time.

You're suggesting that this would "cater to more use-cases". To me, it de-prioritizes existing ones, in favor of what I consider more marginal ones (e.g., alternative implementations). I think Rust is already sufficiently stable, and more of it would be harmful to its evolution.

We do have problems in terms of e.g., build times slowing down iteration, but those problems can be improved with e.g., our move to GitHub Actions, as they will roughly halve build times (and they already are improving the situation re. rollups and getting notified of failures quicker). Other improvements to the rust-lang/rust include taking advantage of pipe-lining more and making cargo check work on more crates.

3 Likes