Compile-time dedented string literals.
Right now this is just an idea. If it is liked, I'll make an RFC.
Problem
Embedding formatted string literals requires making a choice:
- sacrifice readability of the source code
- sacrifice readability of the output
fn main() {
println!("
create table student(
id int primary key,
name text
)
");
}
This outputs (using ^
to mark the beginning of a line and ·
to mark a leading space):
^
^········create table student(
^············id int primary key,
^············name text
^········)
^····
For the output to look sensible, we have to sacrifice on readability of the code:
fn main() {
println!("create table student(
id int primary key,
name text
)");
}
This produces the expected:
create table student(
id int primary key,
name text
)
Why can we not have the best of both worlds?
Solution: Dedented string literals
The new string modifier d"my_string"
(similar to b"str"
, br"str"
, etc.) un-indents the string literal at compile time so the leftmost non-space character is in the first column
Our problems above would be fixed by using a dedented string literal.
fn main() {
println!(d"
create table student(
id int primary key,
name text
)
");
}
The above will output:
create table student(
id int primary key,
name text
)
More Examples
fn main() {
let testing = d"
def hello():
print('Hello, world!')
hello()
";
let expected = "def hello():\n print('Hello, world!')\n\nhello()";
assert_eq!(testing, expected);
}
Works with raw string literals:
fn main() {
let testing = dr#"
def hello():
print("Hello, world!")
hello()
"#;
let expected = "def hello():\n print(\"Hello, world!\")\n\nhello()";
assert_eq!(testing, expected);
}
Works with byte string literals:
fn main() {
let testing = db"
def hello():
print('Hello, world!')
hello()
"#;
let expected = b"def hello():\n print('Hello, world!')\n\nhello()";
assert_eq!(testing[..], expected[..]);
}
Exact behaviour
- The opening line (everything immediately right of the opening
"
) must contain only a literal newline character. - The opening line's literal newline is removed.
- The closing line (everything immediately to the left of the closing
"
) may contain whitespace, but the whitespace is removed. - A single literal newline character before closing line is removed if it exists.
- The common indentation of all lines (other than the opening or closing line) that do not fully consist of whitespace is calculated.
- That common indentation is removed from the start of every line.
Creating strings that have an indentation on every line is not supported.
This is similar to the indoc!
crate, but included in the language.
Why I believe this should become a language feature:
- It is widely used. 110 million downloads on crates.io
- Avoid a dependency for a feature that can be commonly used and helpful.
- Increases discoverability of the feature. Code samples that may not have previously depended on a crate can utilise it
- Dedented string literals make code more legible. I assume people are not going to always add a crate for this feature, so they're going to have to sacrifice code legibility.
- It fits with the current "string modifiers" that Rust has, stacking with them.
- Dedented strings can be formatted by
rustfmt
to have 1 more level of indentation than the surrounding code
Drawbacks:
- Increases language complexity. I believe it should not be a large increase, and it is worth it.
Prior art:
- Java - text blocks using triple-quotes.
- Kotlin - raw strings using triple-quotes and
.trimIndent()
. - Scala - multiline strings
using triple-quotes and
.stripMargin
. - C# - Raw string literals
- Python - multiline strings using triple-quotes
to avoid escaping and
textwrap.dedent
. - Jsonnet - text blocks with
|||
as a delimiter. - Bash -
<<-
Heredocs. - Ruby -
<<~
Heredocs. - Swift - multiline string literals using triple-quotes - strips margin based on whitespace before closing delimiter.
I have taken a lot from a very similar JavaScript proposal