The case for a new relese channel: testing

Yeah, there's a tension here between "I want the latest version of this feature" and "I want the stable version of everything else". I suppose I'm thinking of this less as "we're going to do a call for testing for feature X" and more as "I really want this feature, I want a way to test-deploy it internally so I can give feedback". I think the former is tough to pull off in practice because it requires an active driver for each feature regardless of whether there's interest in it, which I think realistically won't happen. Basically, I feel like we want the process here to support "on-demand testing" rather than "solicited testing".

The point of the channel would be to test specific features that need more widespread testing. The primary work there is getting the widespread testing; turning features on or off will be a relatively small part of maintenance and curation there. Every feature enabled there could have a specified end date, and we'd expect the people most interested in stabilizing a specific feature to pay close attention to whether there are enough experience reports by that time.

2 Likes

So the idea would be that if there's a feature I want to help test, I would first need to advocate for it to be added to the "testing set", and then wait for that to happen? I think that makes sense for very large features that we really want a "call for testers" for, but doesn't work quite as well for features that aren't necessarily useful for many users, but rather for few but large users. The example I'm thinking of here is something like cargo's patch-in-config, which is probably not that useful for most individual users of Rust, but is likely very useful for many large organizations using Rust in the context of their internal build systems.

Yes.

I think testing could support either use case; either way, I think we'd get useful experience reports.

Yup, I agree. I like it if we can find a good process for adding/removing features from the set. Some questions:

  1. Getting a feature added to the testing set should require approval, but what kind? An FCP seems like overkill, but single-maintainer may be too little? Maybe two is a good number?
  2. Features should only be approved for a certain amount of time, but how long? We could always cut the period short if enough evidence is gathered, so maybe err on the "long" side. But at the same time, if it's too long, we'll probably end up accumulating a decently large set of features over time, which likely reduces the set's usefulness. As a starting point, how about six weeks to match a release cycle?
  3. Once the time limit for a testing feature expires, how is the feature removed from the set? Ideally this process is automated, or at least a reminder to remove is automatically written to the tracking issue by some kind of bot. Is this something that can easily be added to the existing bot infrastructure?
  4. A "nice to have" bordering on "important" is to plan for the continuity of a feature if the decision is "stabilize". That is, if we decide to stabilize a feature, we probably want to make sure that there is always some non-nightly that allows the use of the feature. So, the feature should not be removed from the testing set until the next beta is release. Otherwise, users of the feature may have to revert changes they've deployed in the gap between testing and beta, which seems unfortunate. Thoughts?

(Disclaimer: all of this is focusing on the question of "how" and ignoring the question of "if"; this shouldn't be taken as agreement yet.)

  1. I'd suggest using the normal rfcbot process, just without paying attention to the usual 10-day delay after consensus. For instance, this is a tool we can apply during a regular weekly meeting, in which case if there are enough people present, we'd FCP and check the boxes for everyone present, and it'd immediately go into FCP, at which point we proceed.
  2. It'll depend on the feature and how much usage we expect it to get. Generally, I'd guess either 6 weeks or 12 weeks would work, for any feature that actually requires changing Rust code to take advantage of it. For features that just involve passing an option and seeing the result, as little as 2 weeks may suffice. That said, given your point about continuity, we may want an amount of time that'll be slightly longer than the time until the next release, so something like 8 weeks might be preferable.
  3. One approach would be for the feature-enablement configuration to be in a file together with dates, and CI could compile in only those features whose dates haven't passed. That way, if we forget to prune an entry, it's still disabled on time. That also makes it easy for the scripts that generate meeting agendas to add near-future dates to the agenda.
  4. I'd hesitate to push that too hard. I understand the desire for continuity, but anyone using the testing channel is helping with an experiment, and needs to understand that they may need to roll back any changes they make. For that matter, we may end up changing or evolving a feature on the basis of experience reports.

We should also have a loose upper bound on the number of features we're testing concurrently.

Ack.

  1. That's a good point. In that case I think FCP without the 10d delay is a good way to go so it can re-use existing mechanisms.
  2. I suppose the natural follow-up here is whether there should be different time periods for different features, and how those periods are decided. It feels easier to pick one and say that all changes have the same maximum testing time, and then acknowledge that features can be stabilized sooner than the maximum testing time if sufficient evidence is present. I like your proposal of 8w.
  3. That's a super interesting proposal. I suppose this would then need to be some kind of build script, but I don't have enough insight into the current build process for rustc/cargo to say how easy/hard that would be to implement.
  4. Yeah, this one is also a balancing act. The reason I propose it is that I think it'd reduce the friction if a feature is stabilized. You're completely right that anyone testing it would need to be prepared to roll back or adapt to changes, but if the decision is to stabilize, I think it'd be worthwhile to say that "the testing window will be extended to the next beta release date".

Returning to the question of whether something like this should happen, I think the primary question is "why a new channel?", which then deconstructs into:

  • Why not unstable features on stable/beta? This is already possible with RUSTC_BOOTSTRAP, but it's probably not something we want to encourage. Users should be able to rely on the stability guarantees of stable, which immediately goes out the window if there's an escape hatch to get unstable features there anyway. Arguably, RUSTC_BOOTSTRAP should be removed and bootstrapping should happen with testing. As for "why not beta", the argument is that beta should be exactly equal to the stable that is to come. If that's not the case, we end up conflating the role of beta and undermining users' willingness to run beta since it means also opting out of the stability guarantees.
  • Why not use nightly? As I've tried to articulate earlier in this thread, nightly means opting into too many other things at the same time, such as changes that landed yesterday and thus haven't baked for very long. There is also the concern that as the rate of change increases for Rust with increasing adoption, it'll become increasingly hard to find a nightly that has no known problems. In some sense, that's what releases are for — finding a slice in time where everything works correctly, and where even if they're discovered not to, to commit to backport changes to restore correctness. It feels unfortunate to place that burden on testers.

I think I'm seeing this one differently. From things like const generics, we've seen features that end up needing extra warnings to discourage people from trying them out when they're not ready yet. So we seem to get on-demand testing with existing nightly mechanisms.

Whereas with the -- very good -- success of getting people off using nightly regularly, I feel like solicited testing is more of the problem. We're getting more and more things that are more "nice, but not essential" things that mean people are unlikely to move to nightly to try it out. It makes me wish for "please try these out" calls to get more experience reports.

(But there's probably a bunch of stuff in the middle too.)

2 Likes

I'm wondering about the choice to base testing off of beta. You mentioned that one of the disadvantage of nightly is its lower QA, and that might not be acceptable inside companies.

I'd argue that beta is not suitable either. The beta channel has a reliability story that drastically changes over the course of six weeks: at the start of a cycle it's as reliable as just picking the nightly of the day of the beta cutoff, then as the weeks go by it becomes more and more reliable as backports are landed. Near the end of the release cycle it's practically as reliable as stable, and finally when a new cycle starts it reverts back to be as reliable as nightly.

I'm also wondering whether you'd want fixes to tested features to be backported to testing (and thus beta). I would see no problem in backporting the changes to testing if it wasn't tied to beta, but landing backports for unstable features on beta makes be a bit hesitant.

7 Likes

The issue is that the rustc built during a bootstrapping stage is responsible for compiling it's own standard library (that necessarily uses many unstable features). Removing RUSTC_BOOTSTRAP would have to be reconciled with the fact that, currently, when building stable rustc, the final standard library is built by stable. I don't know enough about the rustc bootstrap process to comment on challenges with building the standard library using a separate testing channel rustc built before building the stable compiler, but I can assume that similar issues would have presented themselves if the same was tried using the existing nightly channel.

RUSTC_BOOTSTRAP should cease to exist wrt. users and user crates imo, but I don't think it's fiesible to eliminate it entirely (except by providing a new mechanism for stdlib use).

2 Likes

Oh, yeah, that's a very good point. I suspect there's a difference between "features wanted by many individual users" and "features wanted by few but large users", and the latter is maybe what I'm getting at. Take, for example, the "patch section in .cargo/config" feature — that's unlikely to be useful or even interesting to most individual users, but it's likely to be very useful and important to "large" users that, say, integrate with other build systems. That's not to say the feature is more or less important overall, but more to indicate that different features have different target user populations, and those user populations in turn have different requirements for how they use Rust.

To continue the example, individual Rust users are probably more willing (arguably too willing?) to jump on nightly to get features they really want (like const generics), which means that those features are likely to get a lot of testing right off the bat. Features that aren't interesting to individual users but are still important for many users indirectly through larger orgs on the other hand don't get that surge of testing, because such users generally can't jump on nightly, and thus can't test. The voices of the "real" user base behind such features are also hard to get at, since they may not know that pain they're feeling in their developer experience with the org build tools stem from a particular upstream feature.

All that said, I agree with you (all) that having a specific list of features that are available on a hypothetical testing channel is a good idea. Both because it discourages "just use testing" and because it means we don't expose things that aren't quire ready for wide-scale testing just yet.

You're right, and I don't have a good answer for this. There's a balancing act here between wanting to not have to wait months after a feature lands on nightly to test it, and wanting the stability that comes with longer baking times. I chose beta for the proposal mainly because it doesn't impose a 9 week (6w + 1/2 6w in expectation) wait time, but also at least gets backports (more on that below). I think if testing forked off of stable instead, that wouldn't necessarily diminish testing, but it would sort of artificially delay how long it takes to gather evidence, and some tester might "lose steam" because they have to sit idle for two months.

There is the option to say that testing gets cut from beta once it's at least X weeks old (and on any subsequent beta releases), which might help mitigate some of the concern, though that is also something users could opt into themselves by choosing when they update their testing (but not sure if that's better).

I think there should be no guarantee that fixes to unstable features get backported to beta — it's too onerous a requirement I think. Maybe there could be an option for people to contribute such backports, but I don't think it should be the general expectation. Of course, this then raises the issue that users may have to wait a while to test feature fixes on testing, but I think this is a fundamental choice between backporting nightly fixes to beta (and thus testing) and having testing come with some amount of stability guarantees. Either testing draws eagerly from nightly or fixes are backported — there isn't really an in-between (I don't think). And I think "draw from nightly" would be too much of a repellent to make testing useful in the first place.

I don't have a good proposal here either. Specifically, I don't know how we can have a way to opt out of the stability guarantees for a stable compiler but just for building stdlib/the compiler. I wish I did. We could require that the Rust build process build two versions of the compiler that are identical except for a flag for whether unstable features should be allowed, and then use the version with the flag set to compile the stdlib and then ship it with the other version (that does not have the flag set). But it'd be painful and I'm sure I'm missing something.

I think that discussion is somewhat orthogonal to the proposal though — ultimately, we probably do not want to encourage the use of RUSTC_BOOTSTRAP in any form outside of building stddlib, which then lands us back into this proposal.

1 Like

So, I think probably more people use nightly as their daily driver than you might expect. This is true for large corps. too (as of rustconf2019 in the "enterprise in rust" unofficial meeting it was anyway), although we probably should strive to lower the amount of this (both for users and companies), and perhaps this is a good way to do so.

I also will note that I agree that this only makes sense as a way to gather feedback on a small set of "greenlit" features. If it can use any feature, it's basically nightly without the name nightly.


That said, I think it would be useful as a tool that project groups needing wider ecosystem feedback could use.

  • One case thats immediately obvious to me, as it's a group I participate in, is the portable simd work. Once things are further along, getting it on a testing channel and asking crates with simd accelerated functionality to try porting their SIMD routines to it would be very valuable to uncover any API holes, pain points, or confusion that could help guide some of the doc.

  • Another interesting example might be "custom test frameworks", chosen mainly because it occupies a strange place where it would almost certainly be widely used if it were stable, but is almost never used because it's a nightly feature.

    (Note: As it stands IMO this feature needs a project group to help before it can stabilize — there's many issues to untangle, not all of them technical, so testing doesn't make sense for it as it stands, however after all that, putting it on a testing channel for broader feedback seems like it might help kick the tires a great deal)

Admittedly, for these cases, none of this does that much that we don't accomplish by writing a blog post asking for usage feedback of what's on nightly...

(Aside from, of course, allowing corporations which have outright banned nightly usage to sidestep this, which seems, well, I guess it's fine, since it's a substantially limited nightly-like — maybe I'm being too cynical here though — especially because I mostly like the idea)


From things like const generics, we've seen features that end up needing extra warnings to discourage people from trying them out when they're not ready yet. So we seem to get on-demand testing with existing nightly mechanisms.

I think this is going to be true for some features, but not for others. One thing about const generics is that it existed in the same shape as unstable for a long time, and shipped using a subset of that shape.

It also is one of the first things you hit as a limitation for rust generics if you are expecting C++ templates. The second there is probably specialization, and my gut is that specialization/min_specialization will see the same thing as you describe.

Thing is, I don't really know that this applies more widely. That is, for some features they'll benefit from the "spotlight" of being on testing, for some they probably won't since people will use nightly just to test them.

I mentioned test frameworks as an example of a major feature that I think people would use if stable, but won't rush to nightly just to use — I strongly suspect there are others, but... am unsure which they might be — most of the things in that "would use, but wont use nightly for" I can think of are small nice-to-haves.

Probably any new types of proc macro (have we stabilized all of the places where you can use them?) would fit here as well, but I don't pay attention closely to the macro ecosystem.

RUSTC_BOOTSTRAP should cease to exist wrt. users and user crates imo, but I don't think it's fiesible to eliminate it entirely (except by providing a new mechanism for stdlib use).

This is a can of worms I don't think we should open here. The absolute minimal smallest version of this still took two years and lots of heated argument to push through: https://github.com/rust-lang/cargo/issues/7088.

That said I agree it isn't feasible to get rid of RUSTC_BOOTSTRAP for the compiler itself, since otherwise there's no way to build the stable compiler with stable.

I also will note that I agree that this only makes sense as a way to gather feedback on a small set of "greenlit" features. If it can use any feature, it's basically nightly without the name nightly.

Big :+1:, about half the rustdoc flags only exist for docs.rs and I would really hate for people to start using them and getting mad if we removed them.

I mentioned test frameworks as an example of a major feature that I think people would use if stable, but won't rush to nightly just to use — I strongly suspect there are others, but... am unsure which they might be — most of the things in that "would use, but wont use nightly for" I can think of are small nice-to-haves.

Most library features I expect - things like impl Iterator for [T; N] and doc(cfg).

1 Like

If the testing channel is exactly the same as beta, except that testing additionally allows the -Zallow-features argument, then I guess nobody would use beta anymore, since testing is strictly superior from a user's perspective. That makes me wonder if it wouldn't be better to just allow the -Zallow-features argument on beta. You already gave an answer to this:

Could you explain why this is bad? I understand that the beta branch shouldn't be modified when turning it into stable, but just disabling an argument doesn't seem like a big deal to me. The other thing you mention is the opting out of stability guarantees, but again this doesn't seem like a problem to me. Users who care about stability can simply choose not to use the -Zallow-features argument, so nothing changes for them. They can't get unstable features breaking their code by accident.

1 Like

Depends what you mean by "greenlit".

In my reading of the original post, the difference between 'nightly' and 'testing' is supposed to be that 'testing' would have a higher stability of implementation, due to receiving beta backports.

Some feature flags gate functionality that is half-implemented and full of compiler crashes; allowing those would make a mockery of any implementation stability guarantee. But others gate functionality that works perfectly, and just hasn't garnered consensus on whether it should exist or what final name/syntax it should have. The latter category includes lots of standard library feature flags, for example (but also some core language features).

In theory, 'testing' could allow a relatively long list of features with a stable implementation, rather than a short list of high-priority things to test. I don't know if it should. But it could, while still being meaningfully different from nightly.

There is also a psychological aspect. Perhaps your large corporation is willing to use nightly, but based on posts in this forum, it seems clear that many are not, at least in part just because "nightly" sounds like an unstable/buggy implementation.

4 Likes

So, it's not just -Zallow-features that I'm proposing to allow on testing. Rather, testing will allow all "greenlit" unstable features, but will require that you opt into a particular subset as specified by -Zallow-features. If we were to use beta for this, it would basically mean allowing unstable features on beta, which is not what beta is for — beta is specifically for testing the upcoming stable release. And as I mentioned in the original post, I think it'd be unfortunate if lots of users moved from nightly to beta, since they'd then be using older versions of unstable features, while also muddling the feedback for beta with things that are really about unstable features.

You could argue the same thing for stable: why not just allow unstable features on stable with a -Z flag? And the reason is that we really want to ensure that users can rely on the stability guarantees of Rust on the stable channel — if it's trivial to opt into unstable features on stable, then inevitably people will recommend that (think StackOverflow answers), and we'll end up with a host of reports of "Rust is broken/isn't stable" from those who don't realize quite the implication of #![feature] and friends. This carries over to beta: the beta is going to be the next stable release, and the more we can get everyone to treat it that way, the more likely it is that users will a) test it the way they test stable and b) provide feedback specifically about regressions from stable-to-beta. If beta also has additional capabilities, it will be treated differently than stable, which in turn means the testing won't be as accurate.

What's the difference? Beta (and stable) already support unstable features (by using RUSTC_BOOTSTRAP). The proposed -Zallow-features argument is just another, more granular way to enable unstable features. When stable is branched off of beta, this argument can simply be disabled, so RUSTC_BOOTSTRAP is again the only way to opt into unstable features on stable. This is very unlikely to cause regressions, unless people use -Zallow-features (which is not possible on stable). But it should be made clear that by using this argument, you opt out of Rust's stability guarantees. I'm thinking of a warning such as:

Thank you for trying out unstable features. These might still contain bugs, so please report any bugs you encounter on <https://github.com/rust-lang/rust/issues>. Also note that unstable features might have breaking changes in a future release.

This also applies to a testing channel. There are two conflicting desires: The Rust team wants a tight feedback loop on new features, to detect bugs quickly, so it's best for them if many people use nightly. However, users want a mature compiler with very few bugs, so it's better for them to use stable. This is solved by making it appealing for some users to use nightly, by allowing unstable features there.

When unstable features become available on testing (or beta), two things will happen:

  1. Some nightly users will move to testing, because it has all the features they need while being more mature
  2. Some stable users will move to testing, because it has some features they would like to use, will still being relatively mature.

Effect 2 is the desired one by the Rust team, but effect 1 is undesired. So it seems like we have to strike a good balance about which unstable features to allow, to minimize effect 1 and maximize effect 2. On the other hand, effect 1 might actually be a good thing if it leads to increased user satisfaction, leading to broader adoption of Rust in the industry.

This is a good point: With a testing channel, it is easier to categorize bug reports correctly.

Your last paragraph makes a lot of sense, thank you for explaining!

1 Like

To be clear, -Zallow-features does not enable anything. It is itself an unstable feature, and will only work on nightly or if RUSTC_BOOSTRAP is already enabled. Rather, -Zallow-features is an allow-list that means only the listed unstable features can be used even if unstable features are generally allowed (by using nightly or R_B). You can already use -Zallow-features on nightly, and you can use it on both stable and beta with R_B.

The proposal is that testing would require -Zallow-features, so that you cannot use testing just to have all unstable features on a more stable release. This is to encourage the use of testing only for targeted testing of a particular feature (or small set of features), as opposed to "I need a bunch of unstable features", which is better suited for nightly.

For your case 1, I don't think users will jump to testing "because it has all the features they need while being more mature". It is going to be more cumbersome to use (they must specify -Zallow-features), it will only have selected features, and only for a limited amount of time (see @josh's suggestion), and if changes are made to those features, the fixes won't make it to testing for a while. The idea is that most users who need unstable features should be encouraged to stay on nightly. The testing channel is only for users who aren't willing to adopt nightly (for lack of stability) but want to help test a particular unstable feature (likely because they really need it).

For your case 2, I don't think the goal is for stable users to move to testing in the general sense. Rather, testing should be for if there's a particular feature we'd like users who are generally only willing to use stable to test out. Chances are users would only try out testing for short periods of time (~the 8 week timeline for a feature test that @josh suggested).

Actually I would like to argue just that. I too have noticed the two purposes of nightly that you mention in the original post, and as a user I would very much like to have "nightly, but 6 weeks old". For discussion purposes let's say this is the unstable channel. Of course stability can vary a lot depending on the feature in question, but I don't see why the decision for what features to enable isn't put in the hands of users, especially if features are categorized by how stable the rust team feels they are.

I suppose this option is best for users that just want the best Rust compiler and aren't actively involved in the community or testing features (but will incidentally serve that purpose if something starts crashing). Is it really so bad to offer this as an option?

Regarding "muddying the waters" of feedback, I don't think this is a significant concern; it is easy enough to ask that people reproduce any bugs originally discovered on unstable channel on nightly to see if they haven't been squashed in the meantime.

1 Like

Given that I've seen a not insignificant number of issues opened that existed only because people were using an out of date distro Rust compiler rather than rustup managed stable, I wish I shared your optimism.

The other thing is that feature flags and vendor prefixes really aren't that big of a salt when the reward is cool new toys. They'll get stuck in a build script (make kind, not cargo kind), or otherwise forgotten until suddenly everything breaks. (Hopefully loudly, if you're lucky.)

The unstable channel is nightly. It's honestly shocking how reliable nightly is and how few regressions you'll run into, no matter what your update cadence is. rustup even prevents you from updating to a nightly without the tooling you have installed, and find the most recent that does have it. There's no good reason to not just use the nightly channel if you need to (slash want to) use unstable features, unless you're a big conglomerate that runs the risk of accidentally allowing people to rely on unstable features you didn't intend to make available.

3 Likes