pre-RFC: Enable setting of source file, line & column

Various languages, even without the C-preprocessor, have a compiler directive like #line 27 "foobar.pl". This sets the line number and optionally filename. This is important in case of generated code, or when transpiled from another language. The compiler can then report warnings and errors relative to the original source (and maybe the Rust location in parallel.)

My suggestions are macros that do this as a side effect, but vanish into thin air. That way they could be called anywhere (except in strings and comments.) If the filename is not absolute, then it is relative to the directory of the Rust file.

  • Either we could choose a syntax similar to C. There would be 1-3 parameters for line, column and file. If the 2nd is a str, column defaults to 1. If line is the symbol = it is unchanged.
#[line(49)] // line
#[line(49, 27)] // line, column
#[line(=, 27)] // column
#[line(49, "genrust/spec.yaml")] // line, file
#[line(49, 27, "genrust/spec.yaml")] // line, column, file
  • Or we extend the 3 macros we already have, but called with an argument.
file!("genrust/spec.yaml")
line!(49)
column!(27)

I think this one is cleaner.

1 Like

This is already possible with proc-macros, you attach the correct span to the output code and the compiler will point warnings at the right place. (RFC 3200 will extend this capability to non-Rust source files).

7 Likes

I mean when you are generating a TokenStream from a proc-macro it is possible to get the same effect by setting spans.

I guess one use case that doesn't cover is when you're generating code prior to running the compiler, in build.rs or some external tool. In that case it would be possible to implement a proc-macro that you generate into your generated code which looks for attributes like this and uses them to set correct spans pointing back to the original source, but that needs RFC 3200 too to be able to load the spans.

1 Like

Ok, I vaguely get your idea (though from the doc of Span I don't see how to actually set that information.) Still, whichever way this works, sounds complicated.

And, yes I'm talking about the case of pre-generated Rust files.

There's currently not any way to create a custom span. You need to have a span for the correct location already; that's primarily what RFC 3200 would enable (getting spans into nonsource files).

Note that the thing you propose should be just attributes, not macros. There is no point in expanding trivial macros where all you want is simple metadata.

Personally I think such a feature would be useful. However, I'm wary of introducing extra stuff to the default attribute namespace (yes, I always say this, but in this case the names are so simple that collision with userspace macros and macro attributes are even more likely). Something like #[rust::source(line = 49, column = 5, file = "gen/spec.yaml")].

1 Like

I'd say the attributes fit fairly well into the new #[diagnostic::] namespace, since this part of the span info is for diagnostics (it wouldn't impact the resolution part of the span). Except that since proc macros can (unstably) read that span info, it wouldn't be "ignorable" the way #[diagnostic::] is intended to be (as are the other "tool attributes"). (Which is what allows an unrecognized/uninterpreted tool attribute to be not an error, since they're never permitted to impact semantics.)

So honestly I think we'd end up with spans containing three bits of span information — the "real" span location for what source it comes from, the "assigned diagnostic" span location, and the resolution location.

If the attributes have the semantic of setting span info for the decorated syntax only and then resetting afterwards, then attributes absolutely make sense. The way that #file style works, though, is that it's set and impacts any following lines until it's set again. That's the behavior that functionlike macros would have.

The only generically useful functionality I can really see existing is an attribute to set the textual span of some syntax to be a section of a file, e.g. #[diagnostic::span("file", line:col..line:col)], and every token covered by the annotation gets the same span pointing to that whole region. Basically, the way any tokens generated by a macro (call_site() or mixed_site() hygiene) point to the entire macro invocation. Assigning spans at a finer scope seems impractical to stuff declaratively into the Rust grammar.

C preprocessor directives have the "advantage" that they can be placed arbitrarily between any whichever tokens in the actual language grammar. Rust macros deliberately don't have that level of syntactic freedom; functionlike macros can only show up in expression, item, pattern, statement, or type position, and attribute macros can only decorate items, statements, or (unstably) expressions.

Also keep in mind that C and C++ have historically just shown the file:line:col reference and maybe a snippet of the blamed line, whereas Rust tries a lot harder to show specific source regions in errors. Arbitrary span manipulation makes this significantly more difficult; even macros can easily make errors look weird when combining nontrivial error span assignment with nontrivial span provenance. (E.g. how can you point at an expression a + b when the three tokens' spans come from three different files?)

Yes, I think it would be useful. I use HTML templates compiled to Rust source, and it's always annoying to get errors pointing to a temp file, not the HTML source.

Currently proc macros aren't enough for this: they can't be easily run for an arbitrary non-Rust file, and even when they're running they still don't support making Spans out of thin air, so they can't be used for sources in languages syntactically incompatible with Rust (such as processing a bunch of HTML templates in a folder).

Yeah, attributes being proper part of the AST make it hard. In C thanks to the preprocessor being super naive and text based, you can do:

#line file1
a
#line file2
+
#line file3
b

Rust could support sourcemaps — an external file that defines mapping between input<>output spans. This is commonly used in JS transpilers. However, I'm worried about complexity of such solution. In my experience sourcemaps are difficult to generate correctly and their tooling is brittle.

In this case a naive line-based AST-independent preprocessor directive actually isn't too bad. It's trivial to use in string-based code generators. It's simpler than source maps or piping code through proc macros and their spans.

5 Likes

I think specifying source spans should be an option available for this feature, but it must not be focused on spans. In particular, naming the attribute #[span] is wrong, because there may be no well-defined source span associated with the specific Rust span, like in the examples above.

Forcing to specify spans would imply that there is a correspondence between Rust spans (which carry the attribute) and source file spans, and in general there is no reason to expect such a well-defined correspondence. The only thing which is likely to be definitely defined is the source file name (but it also should be optional, because it would often be the same as the generated Rust file name, bar extension). In most cases one can also specify some source line & column which corresponds to the start of the following generated Rust code, but a span mapping may be too ambitious, or burdensome for the code generator to output.

E.g. let's say I'm transpiling C or Java code, and the source has something like add(a, b). I want to transpile the add function on these objects as the overloaded + operator in Rust. Now I get the same issue @CAD97 mentioned above, where in a + b I cannot properly define spans for tokens. The span of the + operator would have to be either the span of add or the span of add(a, b) and in any case it would interact very weirdly with the spans of a and b (either directly preceding them, or containing them, despite + being infix between them). And that's the case where I at least can assign spans. If I need to do significant code restructuring (as in translations of C switch blocks), there may be no spans I could assign at all.

Wow, seems I've stirred something here. I didn't think I want an attribute that affects just the following item. Else we'd have a ratio of maybe 20:1 in the generated source. Also I didn't consider that functional macros can only go in expression positions. These should be possible anywhere.

I'm really thinking in terms of compiler directive. Then there's a choice of pointing a + b to add(a, b). If there's an error on a, the reported position would be somewhat skewed. But that's down to how much effort the generator's author wants to put in.

Would the following be better?

file!{"genrust/spec.yaml"}
line!{49}
column!{27}

Or seeing as they are very different from other macros (much as I'd hate to do that) a new syntax:

file!!("genrust/spec.yaml")
line!!(49)
column!!(27)

Or make it a raw macro:

file!#("genrust/spec.yaml")
line!#(49)
column!#(27)

So could you please explain exactly what that means to you? So we can be sure we're discussing the same thing.

One more alternative to an attribute or macro is the "source maps" used in browsers.

1 Like

I gave an example of where I come from in the 1st paragraph of this thread. Since Rust errors also report columns (and it has a column!() accessor,) I extended my suggestion with that.

I haven't seen that in transpiled TypeScript or minified JavaScript (where it would beat the purpose of minification.)

Can you give an example how it could look like to instrument generated Rust code?

What do you mean? Sourcemaps are used all the time. Usually the actual map file is separate and linked to with a special comment:

main.js

//# sourceMappingURL=main.js.map

Or http header:

main.js HTTP response

HTTP/1.1 200 OK
...
Content-Type: text/javascript
SourceMap: main.js.map

For Rust, something similar could be done. The code generator would produce foo.rs.map alongside foo.rs and use an attribute or macro to link the two:

foo.rs

#![diagnostic::source_map("./foo.rs.map")]
2 Likes

While you have an example, you haven't specified exactly what it does. I can guess, but it'd be preferable if I don't have to.

"Set the line number" isn't unambiguous. If I use line!(100), then is it:

  • All code after the line! instruction is reported as being on line 100. (I set the reported line to 100!)
  • The line where line! occurs is set to be reported as line 100, and each following line reports as one higher.
  • The line following line! is set to be reported as line 100, and each following line reports as one higher.
  • The line following line! and any other preprocessor directives macros is set to be reported as line 100, and each following line reports as one higher.
  • Any of the above, but restoring normal spans at some point (lexical scope, some macro expansion scope, etc).

Any of these are "setting" the line number. It's even more interesting for column!; e.g. does the column get set for the first character in the word column!, the first character after the closing parenthesis, the first column in the following line (potentially ignoring other preprocessor directive macros), or something else?

Plus, what even happens if an error occurs before macro expansion? (E.g. syntax errors, or even module name resolution in some configurations.) What if a macro expands to include a call to one of these macros?

External sourcemaps is probably the only solution which doesn't lead to endless problems. There's essentially 0 chance Rust adds preprocessor style macros (eliminated early, before parsing) for this. trace_macros! is never going to be stable as is for similar reasons of being tied to a (increasingly) false notion of source-order translation.

2 Likes

Source maps are specifically meant to help debugging minified or transpiled code. FWIW, they also worked fine with Google’s GWT which is/was a Java-to-JS transpiler and (partial) standard library inplementation.

1 Like

This sounds like a good idea with solid justification but a lot of prior art. I don't know if the specific syntax really needs to be litigated in this thread without summarizing that prior art first. The file format could be lifted wholesale from js/ts and it would work. Making the generated rs files a giant hodgepodge of code and mappings seems less useful (sometimes I read generated code), but it wouldn't be the end of the world (I am familiar with the concept of sed).

So +1 overall for turning this into a real RFC; I'd love to know what other languages are doing for this though.

I think the RFC would have pretty limited scope:

  • Add an attribute for specifying the sourcemap file
  • Add a compiler flag (and subsequently a manifest option) telling the compiler to refer to sourcemaps when generating debuginfo
1 Like

It's mostly the Rationale and alternatives and Prior art sections I'm thinking should be fleshed out. This thread has a number of reactions to the very concrete proposal of attributes or macros or whatever and I think less consideration for the overall idea. The overall idea seems fairly amenable to the problem solving strategy of "look up how this is done elsewhere and pick the best way", but the RFC author would need to handle that back-and-forth.

1 Like