Stabilizing json compiler message output

We’ve touched on the json format a couple of times here and here. There are some great discussions there, though in this particular thread I wanted to talk about where we need to go to stabilize the json format so that tools and editors can start relying on it.

Currently, that json format is still under a -Z unstable-options flag. Over a month ago, we switched the test suite to using the json as its main format, and its been happily chugging along. Seems like a good time to revisit making the json format stable.

Example error:

error[E0381]: use of possibly uninitialized variable: `x`
  --> uninit.rs:15:9
   |
15 |     foo(x);
   |         ^ use of possibly uninitialized `x`

error: aborting due to previous error

Same error, in json:

{"message":"use of possibly uninitialized variable: `x`","code":{"code":"E0381","explanation":"\nIt is not allowed to 
use or capture an uninitialized variable. For example:\n\n```compile_fail\nfn main() {\n    let x: i32;\n    let y = x; 
// error, use of possibly uninitialized variable\n}\n```\n\nTo fix this, ensure that any declared variables are 
initialized before being\nused. Example:\n\n```\nfn main() {\n    let x: i32 = 0;\n    let y = x; // ok!\n}\n```\n"},
"level":"error","spans":[{"file_name":"uninit.rs","byte_start":546,"byte_end":547,"line_start":15,"line_end":15,
"column_start":9,"column_end":10,"is_primary":true,"text":[{"text":"foo(x);","highlight_start":9,"highlight_end":10}],
"label":"use of possibly uninitialized `x`","suggested_replacement":null,"expansion":null}],"children":[],"rendered":
null}
{"message":"aborting due to previous error","code":null,"level":"error","spans":[],"children":[],"rendered":null}

Features of the json format

The json compiler message format is built on:

  • “message”: The compiler message text
  • “code”: The corresponding error code, if one, and associated code explanation, if any
  • “level”: The level of the message (error, warning, note, help, etc.)
  • “spans”: The areas we are going to be highlighting in any associated source. The primary span is the main span associated with the error message. Additionally, labeled spans will call out points of interest in an associated source snippet.
  • “children”: Any associated submessages. Eg) A deprecated warning may carry with it supplemental notes about how long the feature will be supported before it’s removed.
  • “rendered”: The rendered results of a suggestion

You can see the full structs here: https://github.com/rust-lang/rust/blob/master/src/libsyntax/json.rs#L74-L146

One json message is emitted per compiler message.

There has been some discussion on adding the rendered full message to the json. We can support this using an additional field for the full rendered error (or perhaps two fields, one for styled text and one for simple text).

For tools/editors authors - are there fields you are missing from the above set (assuming the full rendered messages are included)?

I would have liked it if the json format carried more semantic meaning.

We have different “kinds” of errors. One of the most prominent examples is expected X got Y. But there are other special ones like “unresolved name” or “type X undefined or not in scope” that offer zero or many suggestions.

Suggestions currently can’t be “paired”. For example if you have a suggestion that says “move x to y”, you currently need two suggestions, one for removing “x” and one for inserting it at “y”.

Suggestions have no hint to whether the compiler knows for a fact that this is the replacement, or if it just found one possible solution. E.g. the "unresolved name X, did you mean Y" could in the future offer a list of all of the closest matches, but that list might only contain a single item, but even that single item might be the wrong one. Tools like rustfix won’t be able to reliably differentiate between guesses and suggestions.

cc @killercup @Manishearth @llogiq @mcarton

2 Likes

I agree with @ker

Perhaps we should wait for @killercup’s rustfix to get developed further and see what it needs to finalize the format? compiletest is a very simple usage of the json errors; rustfix is more involved and would be better posed to determine how useful the error format is.

That would be great! Also, if anyone wants to help out and get this to a point where it's not just a POC… I would love that :slight_smile:

What does that mean, in particular? Write a JSON Schema file that defines the output? JSON is a pretty "open" format, and you could probably "stabilize" this meaning "those fields will be there, and these other fields may be there". This way, you can later add completely different structures and expand this over time (with e.g., alternatives for suggestions).

@killercup - great question, and I think one that corresponds to what is getting brought up earlier in this thread.

Here what I’m trying to say is that we need to stabilize the JSON fields we already have. We should feel free to add additional fields, so long as it is structurally compatible (ie we’re not renaming or removing any stable fields).

How exactly this formalism happens is up to us. Having a description of the fields and their expected structure I expect is good enough for the RFC, though perhaps that’s best described as a schema, that works for me :slight_smile:

@Manishearth/@ker

While I know it’s probably tempting to make this a catch-all for things tools have wanted to do for a long time, like fix-ups, I want to make sure we can nail down the json we have today before we add anything new. Perhaps I shouldn’t have left it quite so open ended on my first post. More “are there required things we need before we can even consider this v1”?

Having fix-ups and more semantic errors are definitely cool, but I see those as perhaps a v2 piece that we can work on immediately after stabilizing v1.

FWIW, we’re going to end up going through a similar process on the IDE side, where we get in v1 of the features which are just the bare-bones we need, then go to v2, v3, etc… and layering on additional functionality.

The point is that what rustfix needs may require changing the fields in a non-backcomapt way, perhaps by grouping suggestions or something.

While I know it's probably tempting to make this a catch-all for things tools have wanted to do for a long time, like fix-ups, I want to make sure we can nail down the json we have today before we add anything new.

My point is that we're stabilizing it off of compiletest, which is perhaps the simplest consumer of this API; and tells us nothing about how this API could be used. IMO we should include a couple of other consumers before deciding on a stable v1.

I presume this will block stabilizing the new error format for humans?

@Manishearth - oh totally, I didn’t mean “let’s lock down on only what compiletest needs” but rather “let’s see what everyone needs in the v1”. If suggestions are broken now (as you hinted at), let’s see if we can tackle those. If, instead, they kinda work now but could work better with some additional fields, then that can be v2. This thread can also be a good kicking off point for defining v2 also, which is nice.

@djc - Not technically, as the blocker for the new error format is that the format is regex-parse-able. Having said that, I think the spirit of stabilizing of the new error format is that editors are ready for it. If editors need to move to the json to fully be ready, I’d like to have the json format (at least the v1) stabilized and out from behind the -Z unstable-options for them.

I had suggested distinguishing "fix"es from "suggestion"s earlier. The requirements for the latter are far more lenient than for the former, which must be applicable in an automated manner after all. Bonus points for an easy API that allows to do this while keeping the original formatting.

Also this doesn’t preclude allowing to have both multiple fixes and/or suggestions. A tool (e.g. rustfix) should offer the user the choice if presented with multiple fixes by the compiler/lint.

The simpler and maybe more robust approach is to reformat the changed code using something like rustfmt. We use this strategy for refactorings in intellij rust.

1 Like

I’m happy stabilising as is. Since we can always add fields and there is no penalty for having large amounts of data it seems there is no down side to stabilising something that is useful now, as long as we don’t see problems (only extra things we might want).

One thought is that if the data gets big, or requires significant time to compute (I’m thinking significant for an IDE, rather than relative to build times), then we may want to make some fields opt-in (e.g., you can signal the compiler if you don’t want the rendered field). But since that would be opt-in by necessity, it would always be backwards compatible to add later.

I am not sure, including the full error explantion text is a good idea. The message can be very long and is usually not directly used.

I think, It would be better to get the explanation through a distinct rustc call.

@llogiq - it sounds like the “fix” can be a separate field then, and we don’t need to change suggestions to be an array? The latter is the breaking change that I think others were hinting add, but that might be avoidable using another field later on.

@nrc - Right, I was thinking about the json communication protocol. I think we’re okay so long as this json format is only used for compiler errors, where you’ve already waited for the compile to finish, a few ms longer isn’t going to hurt. For interactive protocols, we’re going to want something slimmer. One could be a subset of the other so we don’t duplicate parsing code, which is a nice to have.

@Uther - we’ve had a few requests from tools/editor plugin authors to provide both. You won’t always have access to run rustc twice (eg if it’s a big project, you won’t want to wait). Whereas it may be possible to just run the terminal emitter into a buffer, like we do for tests, and then put that buffer into the json.

Having fix as a separate field would nicely solve the issue. Otherwise rustfix will need some heuristics whether a fix can be automatically applied, which sounds quite hard.

Maybe there should be an option to include the full explanation text. So, the users that don’t need it or does not bother doing two distinct calls, would not have to suffer the extra length.

@Uther - yeah there will very likely be another version of this for the needs of more interactive tools. In this version of the json, the focus is more offline so we’ve got a bit of leeway. Tools should be able to safely ignore the fields they don’t want to consume. I suspect different tools will ignore different fields.

Just to recap:

  • We are shooting for multiple stages. Stage 1 is just to get us going.
  • IDEs/interactive tools will needed a lighter format, but that can be based on this one.
  • Fixes can be a separate field.
  • We should finish implementing the proposed json format, which means putting the rendered output in there (which we don’t have yet)
  • Ideally, we’d have other tool consumers of the json format before we standardize a Stage 1.

Since getting tools to use the json format can be a catch-22, @Manishearth and anyone else - do you know of tools who might try it out? Sorry if someone volunteered earlier and I missed it…

No idea about other tools. rustfix is the big one, and of course compiletest. Clippy is a producer, not a consumer, so it doesn’t matter much here. IDE jump-to-error functionality might like this, but I don’t know of anything that provides this specifically for Rust (the sublime package just regex-scrapes filelines)

The build-cargo package of the atom editor (that you linked in the OP) has a flag to make it parse the json instead of regexing the terminal output. It is planned to add a fix button to every message to be able to apply suggestions one by one.

I don’t think we support all kinds of json errors that are produced currently. But we gladly take suggestions.

One more follow up. I’m getting feedback from a couple people who are working on IDEs that the current json format is working okay. One contributor, who is working on vscode, mentions that the json being behind a flag makes it a little tricky to enable without causing builds to be reset (think “cargo build” from the IDE, then “cargo run” from the commandline having to build again because the flags changed)

Here’s my current proposal. Let’s take what we have in the json now and call that v1. We can move out the rendered content portion and add it in, say, v2. We’ve already been using it in compile-test, a bit with IDEs switching to it, and ker’s tool.

Since it looks like we have ~3 tools of different styles using the current json format, with possibly others we don’t know about, it feels like we may be close to just calling this v1 and then working with those tools authors on what v2 will look like.

Are there any objections to stabilizing what we have as is and then building from there?