How to allow arbitrary expressions in format strings

On nightly, format! and format_args! support referencing variable names from the surrounding scope:

let x = 42;
format!("the answer is {x}")

There are also proposals to add syntactic sugar like in Python:

let x = 42;
f"the answer is {x}"

The only missing piece to make string interpolation as ergonomic as in Python, Javascript, Kotlin, Scala, C#, Ruby, PHP, Swift etc. is allowing arbitrary expressions:

f"{foo + bar.baz()}"

So why isn't this currently planned?

There are a few problems with this:

  • Curly braces in format strings are escaped with curly braces: format!("{{foo}}") prints {foo}. If arbitrary expressions were supported, parsing this would become ambiguous.
  • It's ambiguous when type ascription is enabled: format!("{foo:X}") could mean either type ascription or that the UpperHex trait should be used.
  • The ? operator could be easily confused with Debug formatting: "{foo?}" and "{foo:?}" look very similar.

There might be more problems, these are the ones I could identify myself.

A possible solution is to require parentheses:

format!("{(foo + bar.baz()):?}")

The syntax for this would be "{" "(" expression ")" format_spec? "}"

This avoids all ambiguities. However, it introduces an inconsistency, because variables from the surrounding scope currently don't require parentheses. So our options are

  1. Allow arbitrary expressions only with parentheses

    a) deprecate referencing variable names without parentheses (e.g. format!("{x}"))

    b) accept the inconsistency

  2. Allow arbitrary expressions, don't require parentheses (the ambiguities can be resolved backwards compatibly, but it might confuse people reading the code)

  3. Do nothing

What are your opinions?

2 Likes

https://rust-lang.github.io/rfcs/2795-format-args-implicit-identifiers.html#future-possibilities talks about this as future possibility and also lists the same parenthesis problem.

2 Likes

I don't see how there's an inconsistency in requiring parens? Kotlin does the same thing with the $ interpolation, and obviously you sometimes need parens to clarify precedence or allow parsing.

1 Like

It's inconsistent if parentheses are required for every expression except variable names.

Kotlin does the same thing with the $ interpolation

That is an inconsistency, too.

and obviously you sometimes need parens to clarify precedence or allow parsing

Rust only requires parens to change the precedence that is assumed when parens are omitted: When omitting the parens in (3 + 4) * 5, it would be parsed differently, but it would still compile.

EDIT: So the difference is that in the expression "{(3 + 4)}", the parens are mandatory, even though omitting them wouldn't be ambigous.

Agreed. {bar} could be thought of / documented as being simply sugar for {(bar)}, usable when bar is a simple identifier.

1 Like

Another complexity is tokenization: What happens if you put a string literal in the expression inside the string literal?

7 Likes

format!("...{var}...") is syntactic sugar for format!("...{var}...", var=var). I don't think we should provide a syntactic sugar for format!("...{var}...", var=foo+bar.baz()); I don't think we should support nesting arbitrary expressions into the format string. Simple braced identifiers are easy to mentally parse without ambiguity; arbitrary expressions are not.

If we consider allowing arbitrary expressions, I think parentheses would be a good minimum requirement.

25 Likes

While I don't think we should do this at all, if we did, I think the answer would be to use raw strings if you want expressions containing string literals.

1 Like

Even if you don’t allow it, the implementation probably still has to check for it to avoid a cascade of nonsense errors if someone tries.

1 Like

Sure, but that doesn't require parsing arbitrary expressions, just detecting things that don't look like identifiers and offering helpful error messages.

2 Likes

Ah, I meant specifically the case of allowing expressions but disallowing nested string literals. You’re right that it applies generally though!

1 Like

My opinion is that we should not add such a feature. It is one of those that inevitably leads to unreadable and incomprehensible code. It is very convenient, I admit, to just put expressions into format strings instead of declaring additional variables, but even one such expression greatly reduces readability, let alone more than one. If one is worried about polluting local namespace with extra definitions, a block can be used to isolate them along with formatting from the rest of the code. And what is foo + bar.baz() anyway? Surely, it has a meaning in the context of the format string, and this meaning can be described as an identifier. More importantly, it is important to separate concepts in code, such as formatting and computation.

To conclude my argument, the following is what I would do to avoid being hunted down by peers later:

... // do something
let message = {
    let running_out = foo + bar.baz();
    let dummy_names = bazooka(rabbit);
    format!("{running_out} of {dummy_names}")
};
... // do something else
9 Likes

No, this is not ergonomic. This is terrible. Please don't do this.

It comes up every once in a while, and it seriously hinders readability. Even though it might make writing code marginally faster, it's spectacularly bad for maintainability – I have known this feature from Python and Swift, and it definitely does more harm than good. It's very easy to abuse, and people do abuse it all the time.

For prior art and further elaboration, see this previous topic.

It literally is more ergonomic and readable when used well. Just because a feature is abused doesn't mean that it isn't useful in limited circumstances. Very recently I was writing some code that would have really benefited from this as I had a lot of very small formatting calls which were about 50% straight identifiers that could use the new shorthand, and 50% single-operation expressions that couldn't.

example comparison
rustdoc_types::Type::Primitive(name) => {
    write!(f, "{name}")?;
}
rustdoc_types::Type::Slice(ty) => {
    write!(f, "[{}]", self.wrap(ty))?;
}
rustdoc_types::Type::Array { type_: ty, len } => {
    write!(f, "[{}; {len}]", self.wrap(ty))?;
}
rustdoc_types::Type::ImplTrait(bounds) => {
    write!(f, "impl {}", self.wrap(bounds))?;
}
rustdoc_types::Type::Primitive(name) => {
    write!(f, "{name}")?;
}
rustdoc_types::Type::Slice(ty) => {
    write!(f, "[{self.wrap(ty)}]")?;
}
rustdoc_types::Type::Array { type_: ty, len } => {
    write!(f, "[{self.wrap(ty)}; {len}]")?;
}
rustdoc_types::Type::ImplTrait(bounds) => {
    write!(f, "impl {self.wrap(bounds)}")?;
}

I do know how easy it is to abuse such a feature (I myself have definitely abused JS template strings); but I believe the positive usecases outweigh the potential for abuse.

7 Likes

Personally, I would be much more worried about:

/// relevant Xkcd: <https://www.smbc-comics.com/comic/2012-02-20>
fn how_is_bobby_formed(db: &mut DbConn, firstname: &str, lastname: &str) {
    db.run_sql(format!("
        INSERT INTO students (lastname, firstname)
        VALUES ('{lastname}', '{firstname}');
    ")));
}

which is one of the reasons why I think some kind of general-purpose interpolation syntax should be incorporated into the language; because it would allow providing an API just as expedient to the user as the above, but much more safe, consistent (in terms of how interpolation points are parsed and handled by the API provider) and amenable to static analysis (e.g. finding all references to a given variable).

So if we are to take inspiration from Python, I would rather have it be PEP 501 than PEP 498. (Standardising the more dangerous API first while stalling the more universal/safer one was a major blunder on part of the designers of Python. But that’s an aside.)

3 Likes

Doing so "properly" requires either running recursive token tree lexing to determine the proper extent of a string (since the embedded expression can itself contain strings), and even if we require the string to parse correctly outside (and thus forbid inner strings without making the string parts raw), and keep the lexical grammar regular (modulo raw strings), it's almost certain that we'll still want to do so for error reporting.

And even handling that, determining where the embedded expression ends requires token tree matching, because the inner expression can contain {}.

Allowing arbitrary expression interpolation inside strings effectively breaks any syntax highlighting which tries to just rely on regular expressions, and quite reasonable looking cases will highlight incorrectly.

The format! macro already allows binding names for interpolation, so you should probably be using that, not external names, for the interpolation.

format!(
    "{running_out} of {dummy_names}",
    running_out = foo + bar.baz(),
    dummy_names = spam(eggs),
)
8 Likes

Again, I like to separate formatting from computation, that's why I don't use this feature. Though, I would use it in case of a long format string with many parameters.

As the original RFC author for the incoming variable name shorthand, I'd personally be ok to never see arbitrary expressions allowed in format! and friends.

I'm a very frequent Python user, and make heavy use of f-string formatting. It's only very rarely where I see something other than a simple binding that doesn't look like an eyesore.

Consider something like this:

items = { "x": 1, "y": 2 }
print(f"Trace: {items['x']=}, {items['y']=}")

The interpolated expressions are "just" lookups (plus magic trailing = to print out the expression). There's quite a heavy dose of symbols and strings-within-strings; there could even be hidden side effects in an exotic __getitem__ implementation.

With the limited interpolation we gain from RFC 2795, the result the programmer is forced to write is (I think) much easier to read and review:

items = { "x": 1, "y": 2 }
x = items['x']
y = items['y']
print(f"Trace: items['x']={x}, items['y']={y}")

The only expression I personally would be interested in as follow up from RFC 2795 is support for dotted.names, which I believe are a simple backwards-compatible extension to the current macros.

However, not even dotted.names are side-effect free (in the case misbehaving Deref implementations). Also, I think everyone has their own subset of expressions which they would want to have supported. As supporting any subset smaller than all general expressions is a subjective topic, and I definitely don't want to see general expressions, I'd be quite happy to settle for not allowing any expressions at all. It's not that bad to write format!("Name = {name}", name = self.name).

8 Likes

I think dotted.names would be a highly reasonable and readable extension.

I also personally would love to see the = extension, where {var=} is a shorthand for var={var}; that would be quite handy for debugging.

2 Likes

If it's for debugging only, the dbg macro already covers a lot of cases. E.g. for the equivalent of the python example above, something like

use std::collections::HashMap;
fn main() {
    let items = HashMap::<_, _>::from_iter([ ("x", 1),( "y", 2) ]);
    dbg!(items["x"], items["y"]);
}

prints

[src/main.rs:4] items["x"] = 1
[src/main.rs:4] items["y"] = 2
5 Likes