Idea: Dedented string literals

Compile-time dedented string literals.

Right now this is just an idea. If it is liked, I'll make an RFC.

Problem

Embedding formatted string literals requires making a choice:

  • sacrifice readability of the source code
  • sacrifice readability of the output
fn main() {
    println!("
        create table student(
            id int primary key,
            name text
        )
    ");
}

This outputs (using ^ to mark the beginning of a line and · to mark a leading space):

^
^········create table student(
^············id int primary key,
^············name text
^········)
^····

For the output to look sensible, we have to sacrifice on readability of the code:

fn main() {
    println!("create table student(
    id int primary key,
    name text
)");
}

This produces the expected:

create table student(
  id int primary key,
  name text
)

Why can we not have the best of both worlds?

Solution: Dedented string literals

The new string modifier d"my_string" (similar to b"str", br"str", etc.) un-indents the string literal at compile time so the leftmost non-space character is in the first column

Our problems above would be fixed by using a dedented string literal.

fn main() {
    println!(d"
        create table student(
            id int primary key,
            name text
        )
    ");
}

The above will output:

create table student(
  id int primary key,
  name text
)

More Examples

fn main() {
    let testing = d"
        def hello():
            print('Hello, world!')

        hello()
    ";
    let expected = "def hello():\n    print('Hello, world!')\n\nhello()";
    assert_eq!(testing, expected);
}

Works with raw string literals:

fn main() {
    let testing = dr#"
        def hello():
            print("Hello, world!")

        hello()
    "#;
    let expected = "def hello():\n    print(\"Hello, world!\")\n\nhello()";
    assert_eq!(testing, expected);
}

Works with byte string literals:

fn main() {
    let testing = db"
        def hello():
            print('Hello, world!')

        hello()
    "#;
    let expected = b"def hello():\n    print('Hello, world!')\n\nhello()";
    assert_eq!(testing[..], expected[..]);
}

Exact behaviour

  • The opening line (everything immediately right of the opening ") must contain only a literal newline character.
  • The opening line's literal newline is removed.
  • The closing line (everything immediately to the left of the closing ") may contain whitespace, but the whitespace is removed.
  • A single literal newline character before closing line is removed if it exists.
  • The common indentation of all lines (other than the opening or closing line) that do not fully consist of whitespace is calculated.
  • That common indentation is removed from the start of every line.

Creating strings that have an indentation on every line is not supported.


This is similar to the indoc! crate, but included in the language.

Why I believe this should become a language feature:

  • It is widely used. 110 million downloads on crates.io
  • Avoid a dependency for a feature that can be commonly used and helpful.
  • Increases discoverability of the feature. Code samples that may not have previously depended on a crate can utilise it
  • Dedented string literals make code more legible. I assume people are not going to always add a crate for this feature, so they're going to have to sacrifice code legibility.
  • It fits with the current "string modifiers" that Rust has, stacking with them.
  • Dedented strings can be formatted by rustfmt to have 1 more level of indentation than the surrounding code

Drawbacks:

  • Increases language complexity. I believe it should not be a large increase, and it is worth it.

Prior art:

I have taken a lot from a very similar JavaScript proposal

22 Likes

Another prior art: Raw string literals - """ - C# reference | Microsoft Learn

Previous Rust conversation: Propose code string literals by Diggsey · Pull Request #3450 · rust-lang/rfcs · GitHub

Note that that makes it hard to make something that is indented if that was desired.

FWIW that C# version allows

Foo(1, 2, 3, """
        Hello
            World
    """);

for a string where both lines are indented (the first by 4 the second by 8).

7 Likes

I'll apply Cunningham's Law here. How about the opening line be able to apply a uniform level of indentation, so you'd have:

Foo(1, 2, 3, d"    \
        Hello
            World
    ");

Where the \ applies 4 spaces to the entire block.

This is exactly the kind of bikeshedding that led to the previous proposal stalling out.

I think it might be OK if this mechanism doesn't support strings where the entire string is indented. Hopefully it won't be that hard to write helper functions for const-eval-time indentation.

14 Likes

In and of itself, I'd love a mechanism that could automatically dedent string contents.

However, a concern I see here is its interaction with raw and C-style strings: if we wanted e.g. a dedented raw string, we'd need something like

rd"hello
world"

That is, the combinatorics of adding more string types are unfortunately not working in our favor here. I'm not saying it couldn't be done (the number of combinations is at this time still fairly low) but it's something that seriously needs to be considered for each string type added.

Yes, though isn't each prefix just a modifier that is applied right-to-left (at least in concept)?

  • '...': Start with a single character/rune
  • "..." Start with multiple characters
  • r, r#, r##, ...: Extend the literal unless it is followed by this many #
  • d: Remove common indentation and leading/trailing newlines
  • b: Interpret the characters as bytes and return a [u8; _]/u8 instead of a str/rune
  • c: Forbid 0x00 and append 0x00

As long as these are seen/parsed as individual modifiers, I don't see a big issue with combinatorical explosion (not sure about the impact on parsing difficulty). The number of possible output types does increase of course:

  • o: Make the output owned (heap allocated): Vec<u8>, String, CString (syntactical suggar)
  • f: Treat the characters like a format string (see format! macro) - Quite possible that this isn't a good idea because the result isn't a literal anymore (and thus forces to be heap allocated)
  • l: Make all characters lowercase
  • u: Make all characters uppercase
  • s: Strip all whitespaces

That way you (for example) wouldn't need to write "something".to_owned():

  • ocr#"something"# == c"something".to_owned()
  • obudr#"\n something\n else\n "# == b"SOMETHING\nELSE".to_owned() - Read: Owned Bytes Uppercased Dedented Raw String [1]

Note that I'm not proposing to add those, (except perhaps for o) they're mainly for showing my point of them being modifiers. Especially u and l are probably not useful enough to be worth it, but o [2] and f could be really useful.

In theory even b and c could be combined: bc"test": Take the bytes of the cstring c"test", returning the same as b"test\x00", though since both impact the return type that might be more difficult/problematic to implement.


  1. I've placed them after b here, as they can modify the characters independent of whether they are a string, cstring or bytes. ↩︎

  2. o (Owned) is one that'd also make sense as a postfix, but for that we also have the more verbose .to_owned() ↩︎

6 Likes

how about using multiline string like moonbit:

let str =
  #| def hello():
  #|     print('hello world')
  #|
  #|  hello()
  ;
5 Likes

I think this needs clarification. The behavior I would expect would be "The common indentation of all non-empty lines", rather than caring about lines consisting entirely of whitespace.

I think this is by far the neatest of the alternatives.

Some points I see in favor:

  • It solves the problem of retaining some indentation on the lines:
    let my_python_code =
        #|     # ...snip...
        #|     def python_method(self):
        #|         do_stuff()
        ;
    
  • It also solves the problem of stripping a possible extra newline from the very beginning of the string.
  • As for the problem of deciding whether to have a trailing newline, an empty #| line can just be added, e.g.:
    let my_python_code =
        #|     # ...snip...
        #|     def python_method(self):
        #|         do_stuff()
        #| 
        ;
    
  • Solves having double quotes in the literal.
  • You could even add comments in between the lines of the literal that are not part of the final string, if you want to:
    static HTML_HEADER: &str =
      // my-code-editor-plugin: syntax-hl(html)
      #| <head>
      // TODO: add our code here
      #|     <script>
      // my-code-editor-plugin: syntax-hl(js)
      #|         console.log("hello");
      #|     </script>
      #| </head>
      // let's just have the endline character here for now
      #| 
      ;
    
    

Matter of opinion, but I happen to like:

  • Rust users are already used to each docstring line beginning with a /// or //!. Beginning each line of the string with a #| (or #| with first space not included in the string – even better IMO) has that similar ring to it.
  • I find this alternative really readable.
    • No one-letter prefix to remember/look up.
    • No delimiters.
2 Likes

Yeah I think this is great option, except I would still like some way to compose it with raw strings, C strings, byte strings, and possibly format strings in the future. So I think it should still have quotes and a prefix:

let my_python_code = d"
    #|     # ...snip...
    #|     def python_method(self):
    #|         do_stuff()
    #| 
    ";

// raw version
let my_powershell_command = dr"
    #|     Do-Something .\some\file\path
    #|     Then-Another-Thing
    #| 
    ";

Finally, with the quotes, the leading # is unnecessary, so use just a pipe:

let my_python_code = d"
    |     # ...snip...
    |     def python_method(self):
    |         do_stuff()
    | 
    ";
3 Likes

There is nothing wrong per se with having the whole string indented. Java allows it and it works just fine IMO. Regarding the precise details, the Java Language Specification provides them too.

I'd personally prefer it to having an explicit start of the line suggested above, because it is easier just to copy and paste the values. On the other hand, I agree it that the explicit start of the text block line, well, is explicit and better readable.

Perhaps, it might be valuable to have the option to choose the line-starting character in a way similar to the option to choose raw string terminating sequence. Ideally, it should be possible to apply it to all possible string literals, as wished above too.

1 Like

You actually gave a solution to your format dilemma. It would not go to the heap, and thus only be allowed to include statically known elements:

const STR: &str = "STR";
let str = "str";
let str_str: &'static str = f"{STR} {str}";

No such restriction would apply to the owned variant:

let string = str.to_string();
let string_str: String = of"{string} {str}";
1 Like

Another link for prior art:

That also uses the common leading whitespace of nonempty lines to determine how much to remove.

1 Like

Probably not needed, as it can just be the first non-whitespace character (regardless of what it is + enforcement that they are the same) on the next line (unless not using a line-starting character should be possible, too).

More prior art from nix.

              installPhaseCommand = ''
                mkdir -p $out/bin
                cp target/release/${name} $out/bin/
                cp -r target/site $out/bin/
                wrapProgram $out/bin/${name} \
                  --set LEPTOS_SITE_ROOT $out/bin/site \
              '';

NIx is not a language designed for everything, but it was very much designed for gluing together and interpolating bits of bash. This behavior makes me jealous when using languages that don't have it. It is a beloved feature.

2 Likes

Scala has a function stripMargin to get rid of extra indentation.

One question I have is whether better const eval support can happen faster than this particular bikeshed can be painted. Then flavors of string dedentation can be prototyped in libraries (and possibly eventually promoted to std). Something like

const fn dedented(s: &str) -> &'static str {
    let s = s.to_string();
    // ... dedentate it
    s.leak()
}
use strs::dedented as ds;

fn main() {
    println!(
        ds(####"
            create table student(
                id int primary key,
                name text
            )
        "####)
    );
}
4 Likes

Might be cool if string literals had a const-only coercion to &mut str. Then you wouldn’t even need const alloc support as dedent can only shrink a string.

A bit more prior art in the Rust crate ecosystem: textwrap_macros::dedent! and dedent::dedent!.

1 Like

Both seem to depend on syn, unfortunately. Naively, it shouldn't be too difficult to write a proc macro that takes and returns a single string literal without requiring syn, but I don't know much about proc macros.