Discussion: guidelines (or rules!) for annotating types for readability

See here for context.

Rust as really really powerful type inference within functions, but there is a tendency to try to golf types as much as possible, and there is almost no guidance on how a principled author should go about inserting types to maximize readability. I'm hoping we can use this thread to brainstorm what that guidance might look like.

Optimally, I'd like to see specific rules that say:

  • When to write let x: T for concrete T.
  • When to write K::<T>::blah in expression position.
  • Things that are consequences of the type system that are probably unilaterally bad (for use outside of macros), like <_>::foo().
  • Misc other places where types can appear but don't usually have to, like 0u8 and closure ascriptions.

These rules should have the general goal of improving readability by inserting type information that a human reader could not instantly deduce from context. This requirement, admittedly, makes it pretty hard to nail down, and competes with a second useful requirement: that these rules can be enforced by a linter such as clippy.

I have way, way, way more experience reviewing C++ code than I do reviewing Rust, so I would appreciate it if folks who read a lot of Rust on the daily can weigh in on heuristics.

Thoughts?

2 Likes

I personally don’t think you should ever write types unless you’re forced to.

11 Likes

This code is from an integration test of pharos. Note how the two calls isis.observe( 5 ) return channels over different types. Rustc is actually capable of infering those because of the asserts, but you will get compile errors if somewhere you mix them up and tbh it's quite puzzling as to how it knows the type at first. I think it would be obscure not to annotate here.

// Send different types of events from same observable, and send a struct with data rather than just an enum
//
#[ test ]
//
fn types()
{
	block_on( async move
	{
		let mut isis = Godess::new();

		// Note that because of the asserts below type inference works here and we don't have to
		// put type annotation, but I do find it quite obscure and better to be explicit.
		//
		let mut shine_evts: Receiver< NutEvent> = isis.observe( 5 );
		let mut egypt_evts: Receiver<IsisEvent> = isis.observe( 5 );

		isis.shine().await;
		isis.sail ().await;

		let shine_evt = shine_evts.next().await.unwrap();
		let egypt_evt = egypt_evts.next().await.unwrap();

		assert_eq!( NutEvent{ time: "midnight".into() }, shine_evt );
		assert_eq!( IsisEvent::Sail                    , egypt_evt );
	});
}
1 Like

Honestly, I think this problem can be solved with tooling. For example, a JetBrains IDE with IntelliJ Rust will show inferred type hints inline with declarations. This includes backflow of type information as well, and sometimes is actually "better" than rustc's (in that it can cross "type must be fully known at this location").

Sometimes it fails to infer types (typically around closures or heavy generics usage), at which point I add (usually temporary) explicitly typed bindings to help it out.

There's no reason the RLS, rust-analyzer, or other language intelligence servers can't offer similar capabilities. Any graphical IDE (i.e. one that can differentiate between help text and code) can display the toasts.

If you want to require more concrete typing points, I find enforcing function line lengths (e.g. can't be more than ~20 LoC) to be more effective than trying to decide what points need to be explicitly typed. The functions naturally require explicit typing at their edges due to the design of Rust.


The one point I will add though is that "orthogonal" code paths (such as logging, asserts, etc.) when present shouldn't be required for type inference if at all possible. You should be able to comment out all of the "side effects" not involved in the computation of the main result and still compile.

2 Likes

I would tend to agree, in fact, I think we need more, not less inference.

The one exception I'd make is for unsafe { ... }. I think it's useful to have some more typing annotations there but through reviews, not language rules.

1 Like

rust-analyzer in fact already display type hints in vscode and emacs.

5 Likes

Which is very nice. I eventually found them to be a bit invasive tho (personal preference) so I disabled that. The tooltips are still quite nice. I guess this is a data point in "seeing too many types can make readability suffer".

Why? I have never heard a compelling reason for this other than enabling prototyping, which feels extraordinarily weak compared to "I need to skim O(1kloc) worth of code for something and having types available makes this faster."

I could but it can't. Most code is not loaded into a IJ... most code I read (hell, that everyone in my office reads) is either in a code browsing system or in code review software like Gerrit. In these situations, there is no hope of having type information that the author did not write attached to the code, and the people reading this will be people with significantly less context (or, literally no context, if the reviewer is reviewing for style rather than local correctness... which I do for C++. A lot. It is not uncommon where I work for reviewers to request the author to perform type deduction for them.) The expectation that your code needs special software to view is a really, really bad expectation, and I write all my rust in IJ.

Seeing a gigantic iterator type full of anonymous types can make readability suffer, yes, which is why we have inference and existentials. I also think this is a red hearing: perhaps what bothers you is that your editor is inserting types without caring much about how it affects formatting... I've found that IJ is pretty careless here, so I just turn them off and take care to write type annotations where it is helpful.

5 Likes

And there's no reason those code viewing platforms can't be patched to support rust-analyzer's type hints. (Other than developer time prioritization of course.)

The other problem is that everyone is going to want different amounts of type information from the code (editor). As an example right here, Centril turned theirs (rust-analyzer) off and I keep mine (IntelliJ) on. (I loathe visual alignment and thus the toasts don't unalign anything.)

I'd almost compare this to spaces-versus-tabs (I know I know just bear with me for a bit). In a space-indentation world, everyone's view of the code, in all editors, smart or dumb, is going to be the same. In a tab-indentation (space-alignment) world, everyone can set their indentation width to whatever they want, and this setting is per-user per-environment. Of course, "dumb" environments making the "correct" choice will display a horizontal tab eight columns wide, which few people like, but you can still configure any decent environment to display the tabs your desired way. It even offers new paradigms such as displaying indentation with emoji! (...sorry)

As is probably clear from that rant, I'm on team tabs for indentation, spaces for alignment. Spaces for both is better than using tabs in alignment (because that removes the benefit of tabs). But spaces are what everyone seems to have agreed on, and what all my editors default to, so going against the grain is more effort than it's worth. And I suspect a unsettling number of tabs-camp people still use spaces because they hit the tab key and think it gives them a tab even as the editor expands it to spaces for them.

Bringing it slightly back on topic: the "best" solution is not dealing with formatting at all, and letting the computer deal with it for you. It's a purely automatable task (except in the edge cases of ad-hoc grammars (macros) and exceptionally repetitive code (such as graphics) (which ideally reduces to macros again)), let the machine do it, focus developer energy on other things, etc.

I draw the parallel because type information is very similar in the amount of how personal it is. Someone with little experience with the language and HM type inference will likely want more type hints then someone who spent their formative developer years in Haskell. Any solution you check into version control will uniquely be available identically everywhere, but it's also impossible to (easily) tweak how much is there.

I know it's not a useful answer, but I think the answer is don't let those 1kloc blobs pass code review. Require functions with a "reasonable complexity" cap (existing lint: clippy::cognitive_complexity) and liberally use subfunctions.

(Case study: I actually triggered that lint in a monster of a function. I split it into its three semantic steps in functions within the function, and it benefitted greatly from it. This included, by necessity, typing a location at the edge of the functions that hadn't been explicitly typed prior.)

There can't be "one true style" for when to use explicit type hints (other than "at least when rustc requires") for the same reason rustfmt's "bulldozer style" reformatting doesn't work in all cases: these subjective metrics are context sensitive and vary by use case and user.

I sympathize with the idea that source code should stand on its own and be simple to consume as a plain text file. But honestly, I don't fully agree. We're long past the age of editing plain text files; everyday text editing tasks are almost as complicated as the most advanced IDE. Google Docs and Microsoft Word are amazingly complicated bits of software to "just" edit a text file. Sure, their WYSIWYG editor embeds more information than our plain-text, monospaced, simple source files, right? Well, there's definitely a lot more context to the process than just the source, otherwise rustc would be a no-op and I'd be very confused by the compile times. If I need a specific few gigs of software so I can submit my assignment and read the feedback (no I'm not salty at all why would you think I am), why can't we incrementally improve code viewing experiences with portable intelligence engines like rust-analyzer?

Source code should be intelligible in plain text. We have to interact with legacy environments all the time, after all. But we have technology (insert SpongeBob gif)! Rather than limit ourselves to the lowest common denominator, we should find ways to bring these complicated tasks into the locations where we have the intelligence system at our fingertips to assist us in understanding the code.

And this isn't a "it'll exist in the future" kind of thing. It exists now in rust-analyzer. You can add it to GitHub with SourceGraph(*). It's a better solution than any manual one will ever be, so why can't it be the solution?

(*) Results not guaranteed. They used to include the RLS by default with the language server integration but iirc disabled it due to performance issues. This was a before rust-analyzer even. Also, FireFox will yell at you that it has permission to execute remote code, because it does; that's how they do their plugins for now. Installing a SourceGraph plugin is the same risk factor as a FireFox extension (though with some additional promised sandboxing), just from a different curator's platform.

4 Likes

Or cost. Indexing code is expensive and getting quota to index a complicated language may be untenable.

I think that viewing this is a personal preference is a problem. Go got a fair bit of the way towards selling "there should be one way this code should look." I hate writing Go but it is far, far easier to review than C++ and Rust, because I don't need to cite people for all their personal choices that deviate from a One True Style (that being the Google style guide for C++).

I think I was unclear. When I said "1kloc" I meant a directory with 1kloc of total code.

Again my original comment: 90% of the code I, and all my coworkers, read is in the in-house code viewer or in-house code review tool (which, as far a indexing are concerned, are pretty sophisticated!) and yet the cost of type inference is way higher than trying to teach reasonable habits about writing down types.

I don't believe this assertion, because I get the impression no one has given it a serious try... the total lack of rules of thumb being strongly indicative... and hence why I started this thread. Unfortunately, it seems to have turned into a very unproductive holy war. I'd like to keep on the original topic, rather than continue this argument, and let the topic lock naturally if folks aren't interested.

5 Likes

How do you patch github's PR UI?

1 Like

Yes, but there will always be platforms where this expectation would be in vain. Want to paste something on GitHub gists or pastebin? Good luck persuading pastebin developers into supporting Rust typing just for fun. What if I need (or, heck, want) to read code in plaintext, or on a remote server in a more limited editor?

I hate installing lots of tooling for simple tasks, and IDEs are what I probably hate the most for this reason. I don't feel like I should need to install IDEs or any similarly fancy software to type infer all my code unless I want to compile it, in which case I'll obviously be willing to install the full (but still minimal) toolchain.

That said, I generally use relatively few type annotations, but I think the current state of Rust is almost perfect. I wouldn't want to modify type inference extensively. For example, in the past some argued for global type inference so that not even fn items need typing. I think that's a bad idea, because interface boundaries are the perfect place to be able to see type information (and, ultimately, context).

I also very much don't consider Rust a "quick prototyping" language; I still actually prefer to use it even for prototyping, I'm just willing to hammer out all those Vec<_>s when I do. It's not that hard, and the gains trump the – for me – very minor annoyance. I love seeing how stuff fits together, and I found that to be important not only at the point where the software is already complex and huge, but also in the beginning, where ideas are just about to start brewing in my head.

2 Likes

I think we're just coming from different starting places. In my mind, needing to write the types at all is what needs to be justified, in yours, not writing them is. For example:

I don't think having the types written in the source code makes reading the source code easier. I think it makes it harder.

For what it's worth, I also don't use IDE features that include the type inline. I do let () = and have the compiler tell me, if I really need to work out what the type of some exact thing is somewhere. But that's pretty rare.

2 Likes

It's not exactly this, but GitHub is moving in this direction; see Navigating code on GitHub - GitHub Docs and GitHub - github/semantic: Parsing, analyzing, and comparing source code across many languages

This seems like a false equivalence to me, but not one we have any chance of resolving satisfactorily.

I'm speaking from my personal experience and that of dozens of coworkers. Maybe you're different but the numbers say my case is more common.

1 Like

I actually ran into a counter example to this last weekend. I was trying to prototype some stuff with warp including web page endpoints, JSON API endpoints, and websockets. It was and still is kind of excrutiating, because warp is quite type heavy. Exploration is a mixture of reading generated docs, reading original source code (with hidden reexports for some reason), reading examples and trying to put them into context with the former two, and so on.

If there were some guideline people could use about where additional type information might provide the most value, examples all over could take advantage of this.

With regard to the idea to get all UIs and tools to understand Rust, I find that quite unlikely, to put it mildly. Does play.rust-lang.org provide type hint functionality? Not even the auto-generated rustdoc pages contain type hints, and they actually run through the compiler and are tested.

So I'm a bit disappointed this attempt to increase readability for some people is hindered by the idea that IDEs and tools could take over the task of informing the reader of the code.

2 Likes

I agree it's not equivalent; that's exactly my point! I don't think that people are having the same discussion, because they come from different places. We have different starting assumptions going into the conversation.

Anecdotes are not data. My stuff is anecdotes too! I'm not going to say "well I've been a developer for 25 years and so my anecdote is better than yours." It is all just anecdotes.

These two things tie together, I think: I've spent a significant amount of time in dynamically typed languages, and so I'm very comfortable without being able to see the type. It hasn't even been possible for me at times! I've also spent a significant amount of time with statically typed languages that do no inference, and I've been frustrated at how long types can obscure meaning, and make code harder to read.

That all being said, because I believe that this is a personal thing, that's part of why I don't think we'll ever come up with a universal guideline; what's helpful to you may be obscuring for me, and what's helpful to me may be opaque to you.

3 Likes

I do agree with others that it will be hard to come up with a universal set of guidelines. A complication I see is that the problem of invisibly involved types usually has a couple of solutions:

  • Explicit type annotations.
  • Refactoring into smaller functions.
  • Breakup of code and using semantically useful binding names.
  • Using hinting functions for common cases (to_string, to_vec).
  • In the future maybe type ascription?

What I'm wondering is if it might help to start out with just trying to find the problematic patterns where there is a lack of easily deducible types. As in: Could we come up with a set of lints, or something to determine an opaqueness-qualifier. Because I kind of think while the solutions are hard to guideline in a universal manner, the problem patterns themselves might be more common.

An IDE could then also associate these areas with actions like "break into name bidning" or "add binding annotation".

As an example: In a prototype (that didn't go anywhere) I had some functions that performed fallible operations (annotated with ?), but some of the values it dealt with in the "happy path" were Results. Because this constantly caused small confusions, I added some bindings showing that things are a result.

In this case the problem/solution split looks to me as follows:

Issue:

  • A fallible function operates on concrete result types in the happy path and behavior becomes non-obvious where they mix.

Solutions I could've used:

  • Make more bindings with clear names (what I did,, plus some restructuring).
  • Put all operations on the happy result values in separate well named functions.
  • Type ascription might have helped.

The problem itself might be heuristically detectable. It's certainly possible to come up with a guideline of when it becomes problematic. It's hard to give a universal solution, but it should be possible to determine the general pros and cons of the solutions, and evaluate those pros and cons in the context of the actual problem.

And even if we fail to find a set of problem/solution pairs that has some consensus and usefulness, there'd still at least be a set of "things to watch out for during reviews" with the possibility to highlight them with lints.

I want to say that I do agree with others in the thread that "make smaller functions and use function signatures for type information" is always a good starting point.

Given all of the above I'm wondering if it makes sense to think of it more as a set of "Type Clarity Guidelines" instead of focusing on annotations.

I mean, the only reason I can't give you data is because I don't have the clearence to exfiltrate it, but I do have the data. Not that that really changes your point.

I see a lot of people express some feeling of this form, which is fine, but the not-so-recent trend of stapling type annotations to fully dynamic languages (going as far back, at least, as the Closure compiler for JavaScript) is actual evidence of the assertion that "code should be explicitly typed within reason" is all but required for large software projects (yes, there are exceptions, but I do not know of a single software project on the magnitude of a general purpose operating system, internet browser, or LLVM-scale optimizing compiler without this guiding principle).

2 Likes

Having survived writing OCaml in an industrial setting, I couldn't disagree more :slight_smile:

My experience is that, very often, understanding code requires you to simulate type inference in your head. In many examples, this is both wasteful and error-prone. That's doubly-true in Rust, since we have traits, code derived from return types, etc.

So, whenever type is not trivial from context, I tend to annotate with type quite often.

6 Likes