TL;DR: Circular dependencies between the respective testing of separate git repositories are bad when landing is gated on tests.
Case study:
I’m making a breaking change to an #[unstable] API in https://github.com/rust-lang/rust/pull/46952. No problem, I’ll update its usage in the repository at the same time so that ./x.py test passes. Some of these changes turn out to be in doctests under src/doc/nomicon which is a submodule. No problem I’ll submit a PR to that repository first. But that PR can’t land because that repository has its own CI that runs the same doctests with Rust Nigthly. So I’d need to get my changes in Rust Nightly to be able to land my nomicon PR. But to do that I first need to land my nomicon PR.
I seem to be stuck.
But there’s a trick: for changing which commit hash the src/doc/nomicon submodule points to I don’t need that commit to be in the master branch, I just need it to be reachable somehow when one clones the repository. So I can ask someone with write access to rust-lang-nursery/nomicon to create a new branch there, then create another PR that goes into that branch instead of master. I ask for that PR to be merged without testing (which is not too bad because it’s not master) and now I can push a submodule update to my rust-lang/rust PR that Travis-CI won’t fail to fetch. When the original PR will land and reach Nightly, I’ll then be able to land the nomicon PR into nomicon master.
Some time passes while my original PR is discussed and review. In the meantime, some other change lands into nomicon and the submodule is updated on rust-lang/rust master. Now my original PR has a merge conflict. To solve that I need to do the whole temporary branch dance all over again.
All in all, this kind of change require jumping through many more hoops than I feel should be necessary.
I think this can be fixed in two ways. For nomicon and other submodules whose content rely on unstable features:
Stop running tests for them in ./x.py test. They would become like third-party projects that use unstable features: they might break in a new Nightly, and it’s up to them to update at their own pace.
Or, stop gating PRs to these repository on passing tests with Nightly. Move to testing them exclusively when updating what commit a submodule points to. At this point it becomes debatable why they’re in a separate repository at all rather than directly in rust-lang/rust.
(Some submodules like Cargo for example don’t have this problem because they only use stable features.)
We’ve been wrestling with similar problems around tools like rustfmt, that rely on unstable features.
In my opinion, the best way forward is this:
Allow things to land in the compiler, changing the “testing state” for the repos that they break to broken.
When the state is moved to broken, we create an issue and cc the author to fix the problem over in the affected repos.
e.g., I broke API X, so I should at least try to change the usage of X in the rustinomicon and rustfmt
Right now, we automaticaly change testing state for tools to broken (do not test). We do not do the second step. This may not be ideal, perhaps it should be a manual step to mark the tool/dependent-repo as broken (and, at the same time, file the issue). This avoids “accidental breakage”, where the problem is actually a bug in rustc that is just not being revealed until the integration test.
@kennytm do you think it’d be easy-ish to integrate the nomicon testing into the automatic management the rls/rustfmt already has? It seems that these are the same use cases build-system-wise at least (from a desired how-the-breakage-workflow-looks point of view)
The implication of that would be: if the nomicon example is broken, it can only be known after the fact, so “accidental breakage” cannot be prevented from entering the tree.
I think that’d be fine yeah, we’d just want to do the same “guarantee it runs on beta” and I’d imagine that any breakage would hopefully be easy to fix.
Yeah, applying the same rules to nomicon and other submodules as we do to tools is probably the right solution for now.
Long term, I’m not sure what to do. This has been the bane of my life for the last few months and is a barrier to entry for potential contributors and maintainers of tools. We’re doing lots of nibbling at the problem (the auto-toolstate stuff, libsyntax on crates.io, etc.) which is making improvements and we should keep doing so, but it doesn’t address the fundamental problem.
Monorepo is the obvious option. The upside is synchrony. The downsides is that it makes contributing to the various “downstream” packages much more onerous, and contributes to the backlog on the bors queue (many more patches to land).
I’d personally prefer to see us move libsyntax to crates.io and improve the tooling, so that people can – once they complete changes to compiler – go and make a PR to bump the RLS (and whatever else) to accommodate those changes.
We do have to make sure we stay on top of it though, since otherwise it’s going to be a pain to move forward.
The main technical question that is (afaik) as yet unresolved is just how much work it is to do that publishing. We haven’t really assigned anyone to be “on top” of it, right? Though I think maybe @alexcrichton did some kind of quick-and-dirty experiment there?
I think this would be necessary but not sufficient. IIUC RLS, clippy, and others need more than just libsyntax. For your suggestion to work, more of the compiler -- and even parts of cargo -- would need to move to crates.io. Not sure what the long-term plans are here, but I agree they could help.
Can’t we have allow_fail builds for the tools, so that introducing a change in master that breaks a tool can be done, but the author of the PR gets instantly notified before the merge by bors about it?
Something like, this PR can be merged, and it passes all tests, but it is going to break tool x, y, and z.
That way, the author can already start working to fix those issues, and once nightly breaks the tool the PRs are ready to go.
Yes we could run the “tools” jobs in the CI. However, if we use allow_fail the Travis status is simply green and no one will notice, so it cannot simply be allow_fail.
A minor disadvantage is that this will also double the resource needed to smoke-test a regular PR before merging, though I think it doesn’t really matter.
@nikomatsakis FWIW I've got a small script that is auto publishing the rustc-ap-syntax crate nightly on crates.io. So in that sense now we've got libsyntax on crates.io already! I still think we'll want to migrate it away from unstable features eventually, but it seems ok to not need to do that just yet (the unstable features it relies on aren't changing as much as the AST is).
Yeah one thing we've discussed in the past is that rustc will have a set of queries that are "reserved for rls" which are sort of the public API that the rls uses. It's all still unstable but it should hopefully be a drastically less-breaking surface area relative to today.
This is more or less unrelated to the discussion, but I feel strongly about it, hence the comment. I don't think it will be a good idea to expose any internals of Cargo, and that instead tools should rely on declarative description of the project, such as the one produced by cargo metadata. I totally understand that it is less work now to just link to Cargo and hijack rustc with a custom thing, but I believe long term this will result in suboptimal architecture of tools.
We could theoretically monorepo if we had multiple queues where e.g. anything that only touched the books would involve a book-only queue. You can’t merge conflict too often, and the times it’s possible (e.g. @bors is working on a PR that modifies a book and something else) we could have the bots synchronize.
I don’t know if Travis and AppVeyor have anything comparable, but CircleCI has a way to coordinate multiple builds in parallel (https://circleci.com/docs/2.0/workflows/), and you can hack together ways to skip/enable builds depending on what portion of the monorepo has changed.