pre-RFC: Supporting #line directives, either directly or with an extension


#1

I’m working with someone building a cross-language state machine generated with m4, generating millions of lines of Rust code. m4’s -s option generates cpp-compatible #line directives that look like this:

#line 42 "generator.m4"

Such a directive provides the line number and optional filename for the start of the following line; rustc can then increment the line number as normal unless it sees another such directive.

Tracking down errors in the generated code becomes much easier if Rust can attribute that code to the appropriate part of the generator. Thus, I’d like to suggest an RFC to (optionally) handle #line directives.

This doesn’t necessarily need to occur by default; it’d be fine if doing so required some special include_generated!("...") or similar.

Due to the large number of tools that already know how to generate #line directives, it would help if Rust could parse those directives unmodified, rather than requiring Rust-style #[line(...)] syntax.


#2

Question: do these get processed when inside a macro invocation, where this syntax could already mean something?

As for turning them on, I believe the lexer already has a special case for hash-bangs, so it doesn’t seem unreasonable to extend that to additional lexer directives. Use something along those lines to turn support for #line on.

As a point of comparison, D supports #line (and only #line) for basically the same reason. This was how I wrote an entire project in a custom literate dialect of D, using #line directives to map source locations back to the real source code.


#3

Ideally yes. This should happen entirely prior to lexing, as you mention. However, I’d be entirely fine with this not being the default behavior, and only happening if specifically requested. That should alleviate any backward-compatibility concerns.

That sounds ideal.


#4

Why would you do it inside comments? Not even C does that.

It still makes me nervous, on the basis that I’d want this to be as transparent as possible. I’m not a fan of processors where certain “magic patterns” which used to be valid suddenly become invalid. That said, I’m not sure whether it’d be practical to handle #line during parsing as opposed to lexing.


#5

Interesting; just tested that, and sure enough it doesn’t. I’ll check back with the person doing the code generation, who requested this, to see what they meant.

Considering this is intended for generated code, it seems reasonable for the code generator itself to just not generate code with that problem.


#6

My point is that macros allow you to use this syntax already. It means that if you have macro code that uses # line, then copy+paste it into a context that undergoes processing, it suddenly stops working, probably in a really obtuse manner.


#7

I believe @josh is saying that since the #line directive can be made available only to a new include macro, it does not pose the compatibility problem: technically there would be two Rust languages that are extremely close to each other and only differ in #line support. (I think this proposal is very reasonable.)


#8

If we want to support for tracking error in generated rust source file, why not to use something like javascript source map. Its more complex but also more powerful than #line directive. Typescript and other language that compiled to js use that format to support debugging.


#9

That wouldn’t help the OP, as they are using a tool that knows how to emit #line directives but not source maps. When someone comes along with a tool that can generate Rust and knows how to emit source maps but not #line directives, then we should think about source maps.

(Also, source maps are optimized for an entirely different usage context. “Minification” of Rust is not something that should ever be necessary.)


#10

Following up on this: we don’t need #line to work inside a comment after all; that was a misinterpretation. Likewise, it shouldn’t work inside a string. It should work anywhere else, though.


#11

HI, I’m a coauthor of that document, and I don’t think that source maps are the right fit here.

The constraints that source maps were designed for aren’t really applicable here.

  • Source maps are separate from the generated code, because actual users of some website don’t care, and don’t want to wait for larger downloads over the network. The generated code links to its source map via a URL, which can be relative, and often accidentally breaks (eg if one of the files is moved and the other isn’t moved with it). The topic at hand doesn’t need to optimize for network traffic, and so shouldn’t take on this headache.

  • Despite that care for reducing network traffic, source maps are utf-8 encoded because at the time working with binary data on the web was a huge pain and not supported everywhere. So in an attempt to keep the utf-8 format compact, the actual debug info is represented with base64 variable length integers. Inheriting this mess would be rather unfortunate.

  • For this topic’s use case, I don’t think we even care that much about being compact, so inline pragmas seem fine.

Additionally, Rust has extant support for a different (and much much much better designed) debugging format: DWARF. DWARF’s .debug_line information maps addresses to {source, line, column, …}. Another possibilty would be to maintain a source to source mapping on the side (Rust to m4, in this case), and do a post-processing pass on the DWARF that rustc emits, fixing the references to generated Rust with reference to the appropraite m4. This experimentation could be done outside of rustc and as a library.

Overall, I think #line "some_file" 42, #[line("some_file", 42)], or line!("some_file", 42) would be the best fit here. Such pragmas are simple to generate and simple to process, and limited in scope.