Println! - use named arguments from scope?

We can already do this:

let message = "Hello, world";
println!("{message}", message=message);

So that got me wondering why println! can't use named arguments from enclosing scopes:

let message = "Hello, world";
println!("{message}");

I really love the f-string literals from Python 3 which do such, e.g.

message = "Hello, world"
print(f"{message}")

I would be interested in preparing an RFC on the subject if it's open for discussion.

Apologies if this has been discussed before, I had a brief search but didn't find any previous threads.

EDIT 31/07/19:

It seems there's reasonable interest in this proposal. We have identified that it should be possible to implement through a transformation that

println!("{message}")

effectively becomes shorthand for

println!("{message}", message=message)

This would generalise to all named identifiers which didn't have a named argument explicitly specified, and would apply to all of the format macros (print!,format, write! etc.)

To make it easy for readers to follow the discussion, below is the summary of pros and cons from the thread. Please post your thoughts and I will update this list:

Pros:

  • Marginally shorter for simple use cases, e.g. println!("{}", foo) could be println!("{foo}")

  • [Opinion] Named identifiers are easier to read than {} or {0}, {1} etc., and this proposal would make named identifiers more convenient to use.

  • Straightforward to implement (see fstrings proof-of-concept crate!)

Cons:

  • Slightly increases complexity of the macros for users.

  • We'd be changing the behaviour of macros that already exist in stable Rust. I don't see how this can be a breaking change, but this still may cause issues. I'm also unsure if this can be feature-gated?

  • We can already implement this in a thirdparty crate. Given that, do we really need to force this proposal on everyone by changing the core language? I would argue this is a simple enough change that it's worth embracing in the core.

  • Error messages: At the moment the statement println!("{message}") will generate a compiler error reading: there is no argument named 'message'.

    Assuming message is not a valid identifier in the scope, the error would instead be the one from println!("{message}", message=message), which reads: cannot find value 'message' in this scope.

    This new error also seems understandable to me, but worth considering, especially for new users of Rust.

11 Likes

Note: Python f-strings allow arbitrary expressions within the brackets. For example, print(f"{1+2}") works and prints 3. We wouldn't do that for Rust's format!.

However, I see no issues with essentially implying label=label when the label is missing. However, I don't know whether it fits with Rust's model of hygenic identifiers. I think it does, but I am unsure.

1 Like

Note: Python f-strings allow arbitrary expressions within the brackets. For example, print(f"{1+2}") works and prints 3 . We wouldn't do that for Rust's format! .

Indeed. I did actually wonder whether my proposed syntax was unsupported because the whole space had been left open for format! etc. to extend in directions like arbitrary expressions in the future. While arbitrary expressions can be nice, I wouldn't argue there's a need for them. I do wonder why you say that we wouldn't want to support them?

An aside, whilst I've been pondering more this afternoon, I've realised that this kind of pattern which I see a lot:

write!(buf, "{}", data)

could, with what I'm proposing, be reduced to the shorter

write!(buf, "{data}")

which seems very nice to me.

2 Likes

The big problem with full expressions is that it kills separation of the lexer and parser (making parsing much more complicated) unless you artificially neuter it. Just some examples:

f"{"+"}"
f"{ 1 /* } */ }"
f"{ '}' }"

Sure, these cases may be deliberately antagonistic. But these edge cases have to be considered in the design of something like this.

And when it's the difference between format!("1 + 2 = {}", 1+2) and `format!("1 + 2 = {1+2}"), I'd take the former for the simplicity it allows the rules to take.

Plus, you have to figure out how to handle :specifiers for expression bodies instead of the simple rule today of optional single identifier.

Allowing arbitrary expressions in the format specifier doesn't make sense when you have a place to put temporaries right there.

2 Likes

That’ not true, supporting full expressions is relatively easy. The lexer doesnt here to call the parser, it can just count braces, which it can certainly do because it is lexer. In Kotlin, string templates support in compiler’s lexer needs about 15 lines of code: https://github.com/JetBrains/kotlin/blob/ba6da7c40a6cc502508faf6e04fa105b96bc7777/compiler/psi/src/org/jetbrains/kotlin/lexer/Kotlin.flex#L159

2 Likes

I.... don't know how I missed that possibility. So long as the lex is unambiguous and mode switching is fully localized in the lexer, it works. (That said, it would change parsing behavior of Rust, so it's still not possible in Rust.) It does technically remove regularity from the lexer, however. (Formally: It becomes parsed with a PDA keeping track of bracket depth on the stack (stack requires two symbols), and just the NFA otherwise.)

Fun fact: the Kotlin playground doesn't count brackets properly for matching/highlighting purposes (put your cursor next to one).

So long as the parent language has matched bracket tokens everywhere, the lexer can keep track of this. Though it does require state (minimally: Vec<count>) and means a (&str) -> (token_kind, length) API doesn't work anymore.

I've been working on the assumption that recursive embedding of the main language in the string language in the main language required a parser-driven modal lexer. Time to reconsider how I'm designing my parser stack I guess /shrug

1 Like

Or, you can do what I assume Scala does, which is to re-run the parse, inside the string, after escape resolution... thought that's because its version of this feature is a userland macro. It works, but it's somewhere north of insane.

While we are at it, another common misconception is that do to language composition, a-la js in html, one needs to come up with a single grammar for the union of js and html.

In practice (IntelliJ) a simpler solution is sufficient: we lex outer language such that the inner language is just a single token. Then we build a completely separate syntax layer for this token.

This actually is more powerful than the composition of the grammars, because you can use name resolution to decide what the inner language is. For example, in Kotlin, if function has a String arg with @Language(“Rust”) annotation, String literals on the call site will be highlighted with Rust syntax.

4 Likes

Just for the sake of testing a little bit the hygiene interaction, I am working on a Proof-of-Concept crate: ::fstrings.

  • hygiene does indeed give problems when trying to generate expressions, such as with format_args!;

    • this happens with feature(proc_macro_hygiene) as well as with ::proc_macro_hack;
  • On the other hand, it works fine when used as a statement, although it does require feature(proc_macro_hygiene):

    let x = 42;
    printlnf!(f"{x}"); // prints 42
    
  • the template parsing code is currently at a bare minimal :sweat_smile:

4 Likes

See also ifmt which goes the whole mile with interpolation.

This actually is more powerful than the composition of the grammars, because you can use name resolution to decide what the inner language is.

I'd refine that slightly to say that when it's possible (e.g. the outer language either controls when the inner language ends or knows enough of the inner language to skip over it), it makes it much easier to do data-dependant parsing.

Something like shell, however, is so tangled that it basically needs interleaved parsing to know when to switch sub-languages.

3 Likes

Isn't such a counting PDA also required to parse Rust's raw string literals (r#"…"#, r##"…"##, etc.)? If so, how is this different?

2 Likes

There isn't formally, but the difference practically is that a raw string is one token, and as such can be emitted in one chunk. You could do similar for a string interpolation, but it would require re-lexing it for work you've already done to skip over it for the root token.

And in that specific case I was taking more generally than just Rust anyway. (If you want to be super pendantic, Rust only supports a finite number of hashes, so you could make the grammar regular by just listing all the cases. This doesn't apply to the bracket counting.)

1 Like

What's the upper bound? After all, there are a finite number of atoms in the universe; it just happens to be an extremely large, unknown number.

1 Like

I thought I recalled seeing it stored as a u8 but I found that to be incorrect with a playground test. I'd guess it's a u16, but don't know where to find it in the code off the top of my head.

Not something that expanding into an NFA-style computation would have any sort of practically for, of course.

u16 would be a reasonable choice, giving a range larger than anyone could conceivably need except for a lexer range-overflow test. That same range would suffice for parsing nested braces. In fact, for human error recovery when coding, even u8 would be overkill in either case. Thanks.

I find these technical arguments important, because I'd rather we not complicate the already-complex parser any more – please, please consider not making the mental model for, and the life of, proc-macro authors harder! It's also ugly and just doesn't feel right to intertwingle something so simple and fundamental from the core language as a string literal with something much higher-level from library-land as arbitrary expressions formatting. Incidentally, I wonder if it would require making Display and the other half dozen formatting traits into lang items, which would be highly undesirable.

However, I'll still jump in here with a non-technical argument against allowing arbitrary expressions: it's plain old harder to read. People do this all the time in Swift and it gives me the shivers to read a "string literal" that in turn contains 10 nested subexpressions, 20 unexpected side effects, 30 decrypted nuclear launch codes and 40 kitten corpses.

Moreover, my perception is that it's not only those kind of overly complex expressions (which could reasonably be linted away); even simple stuff gets ugly quickly when e.g. it contains embedded parentheses for function calls or array indexing.

I think allowing the single-identifier case only hits a sweet spot between convenience of writing and reading. Furthermore, we have precedent in Rust for inferring a single variable name from context but disallowing an unnamed expression: namely, in struct initializers. There we can also leave off the redundant foo: foo part if the field and the initializing variable have identical names, but otherwise the field must be named explicitly. I do realize that's a somewhat different situation and there's a different reason, but it's still a parallel.

14 Likes

::fstrings

It turns out that #[proc_macro_hack(fake_call_site)] does not only solve all the hygiene problems, it also lets the crate be used in stable Rust:

#[macro_use]
extern crate fstrings;

fn main ()
{
    let name = "World";

    // Usage is simple: just append `_f` to the name of any formatting macro
    println_f!("Hello, {name}!");

    assert_eq!(
        f!("Hello, {name}!"), // shorthand for String creation (Python-like)
        String::from("Hello, World!"),
    );

    // advanced_cases
    {
        // it remains compatible with classic formatting parameters
        assert_eq!(
            f!("{hi}, {name}!", hi = "Hello"),
            "Hello, World!",
        );

        // you can override / shadow the named arguments
        assert_eq!(
            f!("Hello, {name}!", name = "Earth"),
            "Hello, Earth!",
        );
    }
}
14 Likes

A macro definitely seems like a good compromise.

I'm pretty interested in the original proposal here – not arbitrary expressions, just "single unadorned identifier". I feel like it would make a lot of common usages simpler to write, and – more importantly – simpler to read.

I'd be interested in more discussion on the pros and cons of that approach!

4 Likes

Wow, I hadn't expected to open up such comprehensive discussion on the arbitrary expressions from such a short question! :wink:

It seems to me that we all more or less agree that it would encourage bad style if we were to permit arbitrary expressions, so I agree let's refocus the discussion on the original proposal. Thanks to @dhm for implementing the fstrings proof-of-concept, that's awesome (and I will almost certainly be reaching for it next time I need println!) :slight_smile:

To organise the discussion, I'm going to update my original post shortly with my current list of pros and cons. Please post your own thoughts and I will add them to it.

1 Like