Editor compatibility and the new error format

So, I am wondering more and more about the buffering question. I don’t think it changes the fundamental calculus, but it does further shift the balance. In particular, if we modify rustc to avoid flushing in the middle of errors – or, at minimum, to flush only at particular points, such as after the --> – then this means that customizable editors can likely use multi-line regular expressions with some reliability.

This is sort of a hypothesis. The datapoints I have are based purely on emacs. There, I found that multi-line regex worked perfectly if I ran the compiler like rustc foo.rs >& ~/tmp/foo; cat ~/tmp/foo, but otherwise they worked…unreliably. Some runs they matched, some runs they did not. When I dug into the code, I found that each time emacs got some text from the input stream, it would check if that text represented a complete line. If so, it would run the regex on the text, but never run it again. If it did not represent a complete line, it would buffer more. (More or less.) So this basically fits the hypothesis then. I imagine other editors will use similar logic, because it’s kind of hard to know just when you should “re-run” the regex across text when new data arrives (but maybe they use some more sophisticated scheme).

So roughly how I see it now is this. Here is the set of use cases:

  1. Run from the command line.
  2. Run from the command line and pipe the output through less or into a file.
  3. Use from an editor like emacs that displays the raw compiler output but “hyperlinks” file names.
  4. Use from an editor that completely digests the input and shows it “inline”.
  5. Use from an unconfigured editor or tool with some default regex.

I think that 2, 3, and 5 are somewhat in tension. If we make the default be to use a “traditional” output format when not writing to a tty, that disadvantages 2 and 3, because that traditional format is less readable (by definition, else we should just use it all the time). However, case 3 can be recovered if we have an environment variable, since then emacs (or whatever) can set the environment variable globally and always request “new-style” output. Case 2 is still disadvantaged unless the user knows to set that variable globally; otherwise there will just be subtle differences.

It’s unclear what Case 4 wants, but I guess it wants to be able to request either traditional or json output, depending on how good the output is. Traditional output would make it easier to grep around, and since you are re-display the errors, you don’t care if they are less readable. JSON gives you the full details, which is best of all, but it’s kind of unstable (something we should start a distinct thread about).

This is obviously not a proposal. I just want to kind of write out the canonical description of space and tradeoffs that I see. (Basically a refined version of my previous attempt.)

1 Like