Idea: Test suite as part of stabilization


#1

I feel like, when we go to stabilize a feature, I am always left with this vague uncertainty: How well was this feature implemented? How closely does its behavior adhere to my expectations and what I remember of the RFC (which of course may not be what the RFC actually said). In theory of course we reserve the right to fix bugs, but some bugs – if they stick around long enough – sort of have a way of being features, so it’s really best to get things right up front.

Therefore, I was considering adopting a requirement (perhaps via RFC, though that may not be necessary) that – when it comes time to consider stabilizing a language feature – we present a clear test suite for the feature. (Ideally, we’d have a directory of tests named after the RFC or feature or something like that, for easy reference.) Then we can review the tests and observe the behavior first hand, while also trying to consider edge cases that might have been overlooked.

Thoughts?


#2

You know I love tests.

So I’m in favor in spirit. The obvious downside is making the stabilization process harder still. But we’ve both long wanted a better-organized specification and reference test suite, and this could help us inch in that direction. So part of the stabilization would be to also link the reference documentation for a feature to the reference tests for a feature.

I’m inclined to be more ambitious and suggest things like automatic coverage measurement, or fuzz/quickcheck requirements, but just reorganizing future tests by feature and thinking hard at it is a good step.

cc @chriskrycho re linking the reference and test suite.


#3

So yes. I’ve found this problem to be more general – specifically, I would like to know, for a given part of the language, what tests it. And it’s hard to do. Basically, I consider our test suite “write only” – if I’m unsure, I just add more tests. That seems bad.

My first thought to was organize everything by directory. But the truth is that there are many ways you may want to categorize tests, and a single test often tests many things. So then I thought we should allow you to label tests with a category:

// category: foo bar

And have some tool that will tell you all the tests in a given category.

I’ve also had the, more general, though that it’d be nice to be able to do (relatively) reliable cross-referencing between files in the codebase. For example, I sometimes embed test names into the code, or other paths, only to find that someone renames the test. I was imaging instead that you could use hashtags like #foo in a comment, and a tidy script would complain if a hashtag is used in only one file in the code. It might allow you to write a filename, like #foo-bar.rs and that would be ok, so long as a file with that name exists.

So, this way, I could add a comment in the code like

// For tests, see the #borrowck-foo-bar.rs test.

and then be sure that there is a file with that name. (If the file is renamed, the renamer will have to fix the link; if it is deleted, they can fix the comment in some way.)


#4

Oh, and for categories:

I had some thoughts, but one of them was to tie it to chapters in the reference manual. This would also encourage us to at least make up a relatively complete table of contents in the reference manual (I understand that the reference has become much more complete lately, so there may not be so much need for work on that front).


#5

This reminds me of the organization of the web-platform-tests repository - it’s subdivided into directories named after the section of the specification that it is testing. The html/ top-level directory contains tests for all seconds of the HTML specification, and that breaks down into a directory per top-level section, which each have subdirectories for subsections, etc.

Since the idea is that the tests in that repository are verifying properties of the specification, it’s usually straightforward to figure out where the main body of tests for a particular web platform feature reside. The only difficulty is figuring out where to place tests that are verifying the interaction of two separate features.


#6

I, like Brian, am always thrilled by tests. Like documentation I would love to consider them an essential part of stabilization. I would be fine with a very informal test suite in the sense that it’s not a formal spec to pass or anything like that. As long as it gets the points across of (a) this is tested at all and (b) looks like it covers reasonable corner cases, I think we’re good to go.

To me, extra pieces on top like organization by feature, flagging tests as normative, etc, would be an extra cherry on top.


#7

Yeah, we’re making slow but steady progress in that direction (and should make a bunch more in the next three weeks, as I plan to tackle it steadily during some time off).

We already gate merging the reference based on its tests passing; given the nightly version is in-tree it seems like it should be possible to gate merging features unless there’s some degree of associated coverage, which might include the reference’s doctests (though that’ll never be even remotely sufficient of course).


#8

Yeah, I’ve been wanting to check in with you and try to coordinate a push. I saw that you had opened issue 9, which frames a “coverage push” in terms of RFCs. That’s a great idea, I think, but there is lots to cover that has never been the subject of an RFC – so I wonder if it would make sense to try to draw up a table of contents independent from RFCs, to start, and also try to drive participation in this way?

(As a simple example, the basics of structs, enums, the trait system, etc, all predate the RFC period.)


#9

Let’s do it! The reason that one is framed in those terms is because of its specific connection to RFC 1636, specifically this section. But I agree entirely that there’s more to do which predates the RFC process! I should be able to finish triaging the RFCs themselves in the next three weeks or so: I have some time off and some of it is allocated to that task.


#10

Great! Actually, if you want, maybe we can schedule a time to just spend an hour or two trying to get started on something?