Pre-RFC: Nicer Types In Diagnostics

Hi all, and @pnkfelix (mentioned in one of the issues as someone who might want to collaborate on this). I am drafting an RFC for the continuation of the following:

It's hard to figure out what diagnostics will be convenient to read, so I've included a draft implementation of this feature.

Summary

When types are printed in diagnostics, each type's fully qualified type incurs a cognitive load on the reader. For example, printing the type std::vec::Vec<std::vec::Vec<std::vec::Vec<u16>>> could have been shortened to Vec<Vec<Vec<u16>>>. This proposal seeks to remedy the situation, by omitted the type's path when possible, however it also seeks to prevent ambiguity.

Motivation

The expected outcome is that it will be easier to read type errors, but without losing the ability to pin-point the exact problem to the user. Currently, a fully qualified type path precisely indicates a type, and we need to make sure that stripping the path does not create ambiguity.

Guide-level explanation

First, for let's define how each non-qualified type falls into either of three categories:

  • Types that are imported by default from prelude.
  • Types that are imported in the scope in which the type errors occurs, and are not prelude types.
  • All other types that don't fall into the first two categories.

In the most reduced from of types printing setting named minimal, we differentiate between the three categories.

  • For prelude types, as long as the type is not occluded by an import with the same name, we omitted the path entirely.
  • For imports that are available in the current scope, we also omit the type path, but introduce a via imports line to the type error (see below), illustrative of the use statement that imports the type.
  • For all other types, as long as the type is not occluded by an import with the same name, we omit the path, but also introduce a similar reachable by line to the error message.

Consider the following example:

fn test3(v: &Vec<Vec<Vec<u16>>>) {}

fn main() {
    use std::collections::HashMap;

    let ip: std::net::IpAddr = "0.0.0.0".parse().unwrap();
    test3(vec![vec![(String::from("x"), ip, HashMap::new())]]);
}

The output will be:

7 |     test3(vec![vec![(String::from("x"), ip, HashMap::new())]]);
  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected reference, found struct `Vec`
  |
  = note: expected type `&Vec<Vec<Vec<u16>>>`
             found type `Vec<Vec<(String, IpAddr, HashMap<_, _>)>>`
            via imports `use std::collections::HashMap;`
           reachable by `use std::net::IpAddr;`

Instead of the uniform, default setting thus far:

 --> src/case2.rs:7:11
  |
7 |     test3(vec![vec![(String::from("x"), ip, HashMap::new())]]);
  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected reference, found struct `std::vec::Vec`
  |
  = note: expected type `&std::vec::Vec<std::vec::Vec<std::vec::Vec<u16>>>`
             found type `std::vec::Vec<std::vec::Vec<(std::string::String, std::net::IpAddr, std::collections::HashMap<_, _>)>>`

For the transition period, there is also a by-import setting that limits reduction of type paths for imports, preventing the the reachable by segment.

7 |     test3(vec![vec![(String::from("x"), ip, HashMap::new())]]);
  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected reference, found struct `Vec`
  |
  = note: expected type `&Vec<Vec<Vec<u16>>>`
             found type `Vec<Vec<(String, std::net::IpAddr, HashMap<_, _>)>>`
            via imports `use std::collections::HashMap;`

Also for the transition period, the three settings can be controlled via a crate-level type_diagnostic feature. For example:

#![feature(type_diagnostic)]
#![type_diagnostic = "minimal"]

More examples in a very early draft rustc implementation of the RFC.

Reference-level explanation

The feature will be implemented via the following steps:

  • Produce an imports map from librustc_resolve back to librustc.
  • The imports will be a map from the module concept of resolve to the DefId that are defined in that scope, with their respective Idents.
  • The parent scope will be included and walkable from each HirId in librustc/ty/print/pretty.rs, so gathering up the uses in scope will be efficient.
  • Code paths that emit errors will need to pass the HirId of the relevant scope, which will be stored in TLS for pretty.rs to catch.

Very early implementation draft in rustc.

(TBD: corner cases)

Drawbacks

We may choose not to do this because people are already too accustomed to groking the current and long uniform paths.

Rationale and alternatives

One alternative is that instead of by-import line, we quote the actual span of the use statement that brought the type. However this may be hard to read, and the use lines may be too numerous. Also, this does not apply to reachable via.

The impact of not doing this is a newcomer's roadblock issue, as was already realized in previous discussions.

Prior art

(TBD: needs research, am not aware of other languages doing something like this)

Unresolved questions

  • How those prints should be formatted exactly, to make them most readable?
  • How do we modify the code for it? Is TLS an okay solution or should we do a fuller rewrite of the pretty-printing engine?
  • Can or should we rely on Spans instead of HirId?
  • Should the default for the setting be kept uniform, should we change the default, or get rid of the setting altogether?

Future possibilities

From an implementation standpoint, the import_map can be used by analysis tools for cross-crate important analysis, say, for determining if there is dead code in a set of packages comprising a workspace.

Also, tools such as rustfix may use the extra printed lines to augment the use pseudo-section of a module.

27 Likes

I like this. Can the "via imports" lines be omitted too if the type names are unambiguous?

Awesome RFC!

Maybe use a rustc flag instead? (which could be exposed via cargo eventually on stable)

What about type aliases? I wouldn't want to see them expanded to their RHS when reading diagnostics.

I would look at Haskell (GHC) which AFAIK does this (including for aliases).

I'd say let's iterate during implementation based on real diagnostics.

Imo TLS in a pretty printer is pretty confusing so ideally it would be avoided but perhaps not at first?

Change it! The current outputs with all these long paths are like the worst part of rustc's otherwise good diagnostics.

I can attest to this also being a roadblock issue for experts as well. :slight_smile:

4 Likes

Let's say we want to drop the via imports for use std::collections::HashMap; from the example above - we would have to iterate all the importable types from all crates (including the local one) to verify that HashMap is singular. It's interesting, and may be do-able.

However, in this case, the via imports appearing would be an indication that there's more than one HashMap available for import, and it may confuse users causing them to wonder "where is this other HashMap lurking". It's a bit hard to explain to users.

You mean via Cargo's rustflags and not the per-package Cargo.toml? If so, developers can have this globally enabled in their development environment regardless of what they are working on. However we would have to push for people to try this feature out. Maybe we can have it both - enable it implicitly in rustc's source via the feature so that once it hits beta all rustc devs will see it in action. It's a big change that may require more eyes looking at the newer diagnostics before we ship it out.

Me too, however I'm not versed in how the type system is implemented - I guess that meta-data regarding alias expansion will need to be kept somehow so that "de-expansion" can happen during pretty-printing of types. Otherwise, trying to fit unrelated aliases on expanded types in order to compress them may be a hard problem (computationally?). It's also a bit orthogonal to this RFC, and we can do it separately.

Haskell is a good example, however it does so partially, for prelude items and for non-qualified imports (equivalent to use in Rust). However, it doesn't do reduction for types that are not imported (i.e. the reachable by bits of this RFC - for that part I haven't found yet a prior art).

Let's see an example:

import Data.HashMap.Strict (HashMap, fromList)
main = do
    let map = (fromList [(1, "x")] :: HashMap Int String)
    show $ (fromList [(1, map)] :: HashMap Int String)

Results in:

    • Couldn't match type ‘HashMap Int String’ with ‘[Char]’
      Expected type: HashMap Int String
        Actual type: HashMap Int (HashMap Int String)
    • In the second argument of ‘($)’, namely
        ‘(fromList [(1, map)] :: HashMap Int String)’
      In a stmt of a 'do' block:
        show $ (fromList [(1, map)] :: HashMap Int String)
      In the expression:
        do let map = (fromList ... :: HashMap Int String)
           show $ (fromList [(1, map)] :: HashMap Int String)

For qualified imports, the type errors in GHC show the full qualified paths, as long as the types are not accessible via a non-qualified import.

So the following:

import qualified Data.HashMap.Strict (HashMap)
import Data.HashMap.Strict (fromList)

main = do
    let map = (fromList [(1, "x")] :: Data.HashMap.Strict.HashMap Int String)
    show $ (fromList [(1, map)] :: Data.HashMap.Strict.HashMap Int String)

Results in:

    • Couldn't match type ‘Data.HashMap.Strict.HashMap Int String’
                     with ‘[Char]’
      Expected type: Data.HashMap.Strict.HashMap Int String
        Actual type: Data.HashMap.Strict.HashMap
                       Int (Data.HashMap.Strict.HashMap Int String)
    • In the second argument of ‘($)’, namely
        ‘(fromList [(1, map)] :: Data.HashMap.Strict.HashMap Int String)’
      In a stmt of a 'do' block:
        show
          $ (fromList [(1, map)] :: Data.HashMap.Strict.HashMap Int String)
      In the expression:
        do let map
                 = (fromList ... :: Data.HashMap.Strict.HashMap Int String)
           show
             $ (fromList [(1, map)] :: Data.HashMap.Strict.HashMap Int String)
   |
10 |     show $ (fromList [(1, map)] :: Data.HashMap.Strict.HashMap Int String)
   |             ^^^^^^^^^^^^^^^^^^^

Which is similar to what we had so far in Rust. Adding a non-qualified import of HashMap in addition to the qualified import works as expected.

In C++, Clang and GCC differ. Consider the non-qualified import equivalent using namespace, in this example:

#include <map>

using namespace std;
void test(map<char,char> *m) { }

int main() {
    map<char,int> first;
    test(&first);
}

GCC emits:

test.cpp: In function ‘int main()’:
test.cpp:8:10: error: cannot convert ‘std::map<char, int>*’ to ‘std::map<char, char>*’
    8 |     test(&first);
      |          ^~~~~~
      |          |
      |          std::map<char, int>*
test.cpp:4:27: note:   initializing argument 1 of ‘void test(std::map<char, char>*)’
    4 | void test(map<char,char> *m) { }
      |           ~~~~~~~~~~~~~~~~^

Clang 8 emits:

a.cpp:8:5: error: no matching function for call to 'test'
    test(&first);
    ^~~~
a.cpp:4:6: note: candidate function not viable: no known conversion from
      'map<char, int> *' to 'map<char, char> *' for 1st argument
void test(map<char,char> *m) { }

If std::map is used instead, Clang emits std::map in the diagnostics, even if it is imported via using namespace.

Sure.

Agreed. Types are being pretty printed via Display trait, and TLS usage is already rampant there, so adding more fields to it may not be a big deal. The alternative is using a newtype wrapper to add context to the printing.

Yes, but perhaps in stages, following my suggestion to enable this for rustc source itself first?

Same here.

I love this, thank you for working on this! Also cc @ekuber as the empathetic compiler czar.

1 Like

I fully support the idea, especially showing short types for unambiguous names from the prelude.

I don't see value in distinction of via imports vs reachable. I generally don't care how a name got there, only whether it's the one I expect.

I'd be fine with just a list of paths in case I really needed to double-check them. I'm not sure how to call that list, but let's say:

  = note: expected type `&Vec<Vec<Vec<u16>>>`
             found type `Vec<Vec<(String, IpAddr, HashMap<_, _>)>>`
             these are `std::collections::HashMap`, `std::net::IpAddr`

4 Likes

I think longer diagnostics as opt-in is something you'd do mostly on a per-developer basis and not something you check into CI, and sometimes you want to switch so I was thinking cargo check --u or something to get uniform errors. (Eventually tho, first it could just be rustc ... -Z uniform-errors.

Agree; I will take any incremental improvement in this area :slight_smile: I think mentioning it as future work and thinking about it a bit would be good tho.

Yep; now you have some text to paste into the prior-art :stuck_out_tongue:

Yep; in my experience, non-qualified imports are more common tho so in practice I usually saw the reduced paths in error messages when I was writing Haskell.

Yeah it was similar for the AST wrt. Display; I've removed some of the Display impls from the AST now and I think it got cleaner overall.

Seems smart to try it out for rustc developers first, agree.

1 Like

Excellent RFC, will definitely make compiler type errors easier to read.

I think that there should be a part in the RFC concerning renaming of imports:

use std::collections::HashMap as Maperoony;

I think that the renamed version should be used in diagnostics in the short form and added as an:

imported by `use std::collections::HashMap as Maperoony;`
3 Likes

This is an option I thought about initially, i.e. listing the fully expanded path. However I realized that emitted use statements have the advantage that 1) you can combine imports into "use trees" such as below and thus make the output more concise 2) in the reachable by case, you can copy the use statements into the code.

  = note: expected type `&Vec<Vec<Vec<u16>>>`
             found type `Vec<Vec<(String, IpAddr, HashMap<_, _>)>>`
             via imports `use std::{collections::HashMap, net::IpAddr};`
3 Likes

I really like this :slight_smile:

One additional thing to consider: for generating replacements for cargo fix this is also useful information.

I've written clippy lints before where I've wanted to offer some replacement code to auto-fix which uses whatever the best path to a type is in scope, and instead I generally end up just giving people fully qualified paths because they are definitely correct. It would be great to expose this information to that context too, so I can write something like token_for_symbol(expr_ty) in my replacement snippet. e.g. https://github.com/rust-lang/rust-clippy/blob/c0b2411f06a417266baaac2f0ea431138da33bf3/clippy_lints/src/default_trait_access.rs#L55

2 Likes

The "compression" of use trees is clever, but I'm not sure if that's the right place to do it, if the error needs to clearly communicate types to user who's apparently confused about types.

Use statements are absolute and can be pasted into most scopes, so for purpose of copying into code, there's no difference between via and reachable.

3 Likes

I assume that we introduced the ability to specify use trees in the code given that it's clear enough to users. If so, it should be clear enough in messages as well?

The difference is that copy paste from a combined reachable and via may not work because this single import statement may be duplicating what's already being imported in the scope. The separate reachable are only stuff that are not already imported in the scope. If we are combining them, then you'd have to remove the duplicates after pasting.

The difference is that users write trees of types if and when they choose to, rather than be given one to parse. And custom code can use a multi-line syntax (IMHO more readable), but such syntax would be too long for an error code.

I thought the aim was to use short types only when they're unambiguous. If I have std::result::Result and crate::Result in the same file, I would not want to have just Result in the error message. So I don't expect clash to be a problem, because full paths are needed only for names shortened in the message, and only unambiguous names should be shortened in the message.

1 Like

Would it be within the scope of this RFC to disambiguate between incompatible types when the fully qualified type name is the same? I.e. when there are multiple imcompatible versions of a crate. Or should that be it's own thing.

2 Likes

I'll wait for more opinions on this.

Good point. crate:: being absent from types in errors, means that types in errors are not fully qualified even with the current rustc. In fact, when uniform paths were implemented, the diagnostics did not change accordingly (see last note at: Misleading `use foo::bar` error message · Issue #61942 · rust-lang/rust · GitHub ). @petrochenkov perhaps can provide more recent update on this. So to prevent ambiguity, we need to care for types at the crate root which shadow prelude names. For example:

struct Result(u32);
fn test(_: std::result::Result<(), ()>) { }

fn main() {
    test(Result(42))
}

Current rustc is emitting:

  = note: expected type `std::result::Result<(), ()>`
             found type `Result`

Which we may need to change:

  = note: expected type `std::result::Result<(), ()>`
             found type `crate::Result`

With the RFC implemented we can have this as:

  = note: expected type `Result<(), ()>`
             found type `crate::Result`

Note that we cannot shorten crate:: here because of two types bearing the same name.

However if there's no Result defined or imported from a non-prelude definition in the crate's root, then we can shorten the path, but only for the prelude type, in the next example:

mod submod {
    pub struct Result(u32);
    pub fn test(_: std::result::Result<(), ()>) { }
}

fn main() {
    submod::test(submod::Result(42))
}

And get (with the RFC implemented):

   = note: expected type `Result<(), ()>`
              found type `crate::submod::Result`

As a side note, I would have renamed the RFC to 'Non-uniform Type Paths In Diagnostics' if it wasn't already the case :slight_smile:.

Anyway, it would be a worthy goal to solidify the rule as to when a non-uniform type is printed in an error without the note that strictly says where the type came from, and my current opinion about it is that only prelude types should be printed shorted and note-less.

We are currently emitting a note such as Perhaps two different versions of crate foo are being used?, so it's less important right now, but I agree it can be made clearer. I'd leave it out for a future change.

1 Like

For the Result case IMHO the ideal would be:

   = note: expected type `std::result::Result`
              found type `crate::Result`

i.e. if there are two types called Result involved, don't shorten either of them (no exceptions for prelude, imports, and such).

1 Like

Yeah, I see that it makes sense. I'll update this when I post the RFC.

Personally I would use the locally available name (the one that would work in the code) as the short version and the one being shadowed with the fully qualified path. Also, we should try and leverage span labels when encountering confusing cases with multiple elements named the same thing:

   |
LL | struct Result(u32);
   |        ----------- the found type defined here
<...>
  = note: expected type `std::result::Result<(), ()>`
             found type `Result`

It would also be interesting to carry the crate version information when relevant.

  = note: expected type `foo::bar::Baz` (foo 0.9.0)
             found type `foo::bar::Baz` (foo 1.0.0)

Finally, it would also be interesting to look at how to leverage type arguments to clean up the output:

  = note: expected type `Foo<A, B>`
             found type `Bar<C, B>`
             where A: String,
                   B: usize,
                   C: foo::bar::Baz<usize, usize>

I think the rule of thumb is that we should always use the short name that would work in the code if we were to replace the primary span, and use longer paths only for disambiguation.

20 Likes