How to allow arbitrary expressions in format strings

If the format! macro interpolates only identifiers, then in my opinion that feature is worse than useless and shouldn't be stabilized in the language. It's really not a big deal to write format!("{foo}", foo=foo), it's not even a big deal to entirely omit those names and rely on automatic indexing. Since Rust isn't Python and requires string literals instead of runtime strings, I simply can't see a situation where one would dump into format! a string complex enough that named variables and string interpolation are required.

At the very least, for the feature to be useful, I would expect it to support constants, paths and fields. It would also be very jarring if a trivial modification to those cases wouldn't be supported, e.g. if I could write format!("{foo.bar}"), but not format!("{&foo.bar}") or format!("{foo.bar[0]}").

Once you start supporting those simple enough expressions, it would be more work to special-case the "simple" part than to just allow any expression in interpolation. Ambiguities are very simple to resolve with mandatory whitespace or brackets. For example, if the inner braces denote a block expression

format!("{{foo}}")

then it could be required to written in one of the forms

format!("{ {foo} }")
format!("{({foo})}")

Regarding redability. Yes, complex expressions in string interpolation can easily be unreadable, however the bar for "complex" is very subjective. I seriously doubt that anyone would find this expression unreadable:

format!("{BUF_LEN - 1}")

Anyway, Rust has plenty of ways to make code unreadable. Macros easily make the code impossible to read. Complex nested expressions are also unreadable, and Rust's expression-oriented syntax allows true monstrosities of nested if's, match'es, loop's and tuple constructors.

The way to combat unreadable code doesn't change: don't let it through code review, and implement Clippy lints.

Expressions in format macros are a particularly easy case to fix: an IDE can easily extract the formatted expression into a variable, or even convert an entire format! call into a form with explicit bindings. This is a trivial job for the proper tooling and it is supported in every language with string interpolation.

As it stands today, the format! interpolation syntax brings quite a lot of complexity for the most trivial of cases. It also introduces new arbitrary limitations within the language, and an inconsistency with every other language's implementation of that feature. If only variable interpolation is stabilized, then there will be an endles stream of people coming from JS, Python and Kotlin asking how to interpolate more complex expression, because that's the implementation that everyone expects.

It would also introduce an unnecessary footgun into format! where there was none. If I make a named formatting argument, then it will automatically pick up the value from the closest available scope, even if that was not what I intended and I have just forgotten to set the value in the macro arguments. Currently this case is eaily detected as an error.

3 Likes

How? The example you cited doesn't demonstrate that at all. It's not significantly shorter, and it is actually harder to read, because all the structure of the expressions is now buried in a string literal, losing syntax highlighting.

(And it wouldn't be better if IDEs/text editors applied syntax highlighting inside string literals, because the double nesting is confusing: are we inside a string literal, or outside of it, or a weird mix between the two?)

9 Likes

For example because you don’t need to jump in code between the interpolation point and the expression spliced at it; the expression appears in code in the same place where the result appears in the output, lowering cognitive overhead when reading the code. Not having that may seem bearable in a one-liner, but please do tell me how you would like, say, this code translated into the current syntax of format!.

Your worries about applying syntax highlighting to interpolation points making things worse seem rather unfounded. This kind of highlighting is already applied to f-strings by Python IDEs, and I am yet to see anyone be confused by it. In fact, I have seen even more advanced highlighting than that, like highlighting SQL syntax embedded in Java string literals passed to a SQL query method; highlighting CSS and JavaScript embedded in HTML according to their syntaxes, as opposed to using generic ‘HTML character data’ styling, is also pretty commonplace. I cannot recall anyone complaining about that either.

2 Likes

I believe that the maximum complexity of Python f-string expressions is the same as allowed in lambda expressions (basically "one liners"). Rust expressions can be way more complicated:

// A single expression
{
    #[derive(Serialize)]
    struct JsonWrapper { field1: bool, field2: u32 };
    let wrap = JsonWrapper { field1: some_local_bool, field2: some_local_int };
    serde_json::to_string(wrap)
}

Is this going to be allowed in a string literal? If not, what rule do we have to hang such restrictions on? Do macros need to do anything different here?

Sure, maybe "no one" will do this, but what kinds of rules can syntax highlighting tools rely upon to not have to consider cases such as the above? How about the compiler?

8 Likes

This isn't quite true. The rule to simplify foo=foo already exists: struct literals. Instead of writing Value { inner: inner }, you can write just Value { inner }. This only works for just a single identifier. Extending the same logic to allow simplifying format_args!("value = {inner}", inner=inner) to format_args!("value = {inner}") makes sense to me. Yes, that specific example could also be written format_args!("value = {}", inner), since it's just the one interpolation value, but much more involved cases exist (especially for e.g. CLI tables and such) even if you don't work with them often.

Tracing's macros accept a very similar kind of expression simplification for structured logging, so you don't have to write value=value. They also accept dotted paths, for "value.inner=value.inner`. It's probably worth at least referencing what is used for structured logging, since it's a very similar application of shorthand, and taking what we can from what they've learned as well.

(Hey @matklad, could you remind us of Kotlin's exact rules for this, again?)

5 Likes

I am aware, and I am very irritated by it.

Now we are getting into territory which is a consequence of a conceptually wrong non-solution to a problem caused by insufficiently expressive programming models. Having to write SQL (i.e. pretty much a complete program in a completely different language) as raw string literals within a procedural language is just about the worst mainstream and accepted practice in programming. It is the result of databases not being prepared to work with the type systems of external languages, or maybe programming languages not being prepared to work with databases. I don't think we should cite this unfortunate situation as an example and thrive to perpetuate such mistakes even further.

7 Likes

Yeah, strong plus one here! To expand on this, it's important that we allow API like the following

use std::process::Command;

fn build_cmd(interpolated: InterpolatedArgs<???>) -> Command {
    ....
}

let branch = "my branch"
let cmd = build_cmd(i"git switch {branch}");
assert_eq!(cmd.get_args.count(), 3);

It should be possible to implement build_cmd in a way that is not susceptible to shell injection. The problem is, I don't think we currently can spell-out a type for InterpolatedArgs.

It's useful to take a look at JS here. They did the right thing here with tagged templates. When you write, in JS,

i`select from ${foo()} where id = ${bar};` 

The user-defined i function gets called roughly as i(["select from ", "where id =", ";"], [foo(), bar]). That is, the user gets to apply custom escaping rules when processing the string.

The problem is, the second argument here is a heterogeneous list :frowning: To support this in Rust, we need a language machinery to define InterpolatedArgs<Ts...> and to define where-clauses like where Ts...: SqlEscape.

Today's format args work because they are specialized to one particular trait fmt::Display and use dynamic dispatch.

I sadly don't see how we can solve a very practical problem of allowing injection-prof APIs without advanced type machinery :frowning: And yeah, there is a danger that, by making string interpolation convenient enough without providing a more general mechanism as well, we'll end up with people using interpolation for things like SQL or HTML building.

12 Likes

Surface syntax works like this "$identifier ${anything.more.complex}". Implementation wise, the lexer counts the nesting depth of {. That is, you can embed arbitrary expressions, including other interpolated strings, but this doesn't require recursively calling parser from lexer

fun main() {
  val x = 92
  val s = "${"$x" + " " + "${x + x}"}"
  println(s)
}
3 Likes

Are string literals allowed? If so:

r##"{(string.add(r#"))}"#))}"##

would be something to consider. So the string parser needs to be active at least (IIUC).

2 Likes

To be honest I can't for the life of me guess what either of the two last samples means :slight_smile:

1 Like

Isn't that exactly format_args!()?

1 Like

I had the same initial reaction when first starting to read this, but it becomes more clear later. Did you carefully read the remainder of @matklad's post? It even explicitly contrasts with format_args!:

^^^^ mentions "format args" :wink:

1 Like

If I am to agree with premise (and to an extent I actually do), I think a general-purpose interpolation facility could be a pretty good opportunity to bridge the gap, without abandoning familiar syntax. Below I briefly sketch how that could be done.

More generally, though, embedded DSLs are here to stay, and so is the necessity to mix them with dynamically-computed values. Whether it’s SQL, XML, HTML or regular expressions, it’s not going away. Might as well find a way to accommodate this use case instead of sticking head in the sand.

Oh, solving that one is trivial: don’t. Do everything at the macro expansion level.

fn how_is_bobby_formed(db: &mut DbConn, firstname: &str, lastname: &str) {
    db.exec_sql!(i"
        INSERT INTO students (lastname, firstname)
        VALUES (\{lastname}, \{firstname});
    ");
}

Now the database client crate can at compile time do things like add type ascriptions to query parameters inferred from the database schema (held somewhere in the build system), or compile the SQL into bytecode, or into the database’s native, non-SQL API. That’s right, the entire the SQL language could be implemented solely in the client, at compile time.

I use postfix macros here, but they are not strictly necessary; everything should work just as well with regular procedural macros. Without a macro to interpret the contents, an i"…" string should be a syntax error. This would serve as syntactic salt against naïve concatenation in inappropriate contexts (thus avoiding the one major design blunder of JS template strings).

I think it’s slightly more nuanced that that :slight_smile: There are two reason why, while it is possible, it’s not necessary trivial.

First, to the first approximation, anything can be implemented as a macro — you don’t strictly need embedded DSL if you have external DSL which compiles down to Rust. But that comes at a cost: security and logistical concerns with actually running the macro code, worse IDE, experience, inability to easily “goto definition” and understand how the underlying library works. Anecdotally, one of the most annoying papercuts I encounter when coding Rust is that I can’t always refactor stuff I put inside format, as refactoring inputs to the macro is somewhat ill-defined.

Second, with today’s proc-macro API doing this is annoying, as you’d have to parse and interpret the string contents inside the proc macro, and, eg, re-do compiler’s string-escaping logic. That is, string literal is a complex thing, and compiler has special rules for how it is interpreted, but those rules are not exposed to macros. To be clear, this is purely elbow grease/aesthetics problem of “I need to copy-paste this bit of the compiler” nature, rather than a fundamental limitation.

7 Likes

Why would it not work by emulating format_args? i.e. adding SqlInjectable and HtmlInjectable traits to use the same trick with dynamic dispatch?

That doesn't generalize to arbitrary text formats. The Rust stdlib can't have a trait for each and every programming language/format.

1 Like

Fair point. Though the problem of analysability and safety of macros will have to be dealt with somehow anyway; perhaps through const fn macros or something like WebAssembly-based sandboxing. With sandboxing in place, it would be more affordable to simply run the macro from the IDE plugin; it might even be possible to use some kind of origin tagging (which is probably already necessary and already done to an extent for the sake of hygiene) to keep track of how tokens appearing inside interpolation points end up used in macro output, and use that information to match them up to binding scopes, etc. Admittedly somewhat ambitious, but I don’t believe insurmountable.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.