Allow escaping space in strings


#1

We have multiline strings. It would be nice to be able to indent them nicely. More specifically:

let s = "hello\
        \ world!";

Would become “hello world!”.

This would be useful for e.g.:

const NO_XTE: &str = "`xte` not found. make sure it's correctly installed:\n\
                     \    pacman -S xautomation\n\
                     \    apt-get install xautomation\n\
                     \    <etc you get the idea>";

Currently I’m using:

const NO_XTE: &str = "`xte` not found. make sure it's correctly installed:\n\
                   \x20   pacman -S xautomation\n\
                   \x20   apt-get install xautomation\n\
                   \x20   <etc you get the idea>";

This gives a visual indication of the target indentation, where the “2” is immediately under the " and the “0” stands in for the resulting space. This is suboptimal.


#2

What I tend to do is unindent the string:

    const NO_XTE: &str = "\
`xte` not found....
    pacman -S ...
    <etc you get the idea>";

Not perfect but reflects the literalness of the string.

I believe the current behavior is that any whitespace after a backslash means to ignore all whitespace until the next non-whitespace character. This would therefore be a more complicated change than you might think.


#3

Yes, the next non-whitespace, aka the \ that escapes the whitespace.

backslash-space would be an escape sequence for space.


#4

OK, I was incorrect. I was under that the impression that \<space> currently meant the same thing as \<newline>, but \<space> is currently an error.

In that case, I think that this would be an unopposed change given that you’ve provided a use case.


#6

One issue is that if you happen to have a trailing space this would silently change the meaning:

let s = "hello\
        \␣world!";

vs.

let s = "hello\␣
        \␣world!";

BTW, is there a reason why Rust does not auto-concatenate consecutive string literals like C? I always found this quite handy.

const NO_XTE: &str = "`xte` not found. make sure it's correctly installed:\n"
                     "    pacman -S xautomation\n"
                     "    apt-get install xautomation\n"
                     "    <etc you get the idea>";

#7

Because it is a literal footgun

const DATA: &[&str] = &[
    "aaaaaaaaaaa",
    "bbbbbbbb"
    "cccccccccc",
    "dddddddddddddddd",
    "eeeeeeeee"
];

#8

Did you look into using indoc by any chance?


#9

I’m not gonna back off from this.

For now, I’ll keep using \x20 everywhere. Fix it or deal with it.


#10

I also like to indent my strings the way you do, for what it’s worth =)

I’m a bit confused though – are you asking for language changes here? Or rustfmt changes?


#11

(<space> means a literal space (U+20) and <newline> means a literal newline (\n or \r\n).)

This is a summary of the points so far.

The requested language change is to let \<space> mean the same thing as a literal space. It is currently an error.

This would mean that you could terminate a \<newline>'s skip-all-whitespace behavior and then include leading whitespace by escaping the first space with \<space> instead of using \x20. The only current way to have \n\<newline> continuation of a string and leading whitespace in the resulting string is currently to use the character-code escapes to start said whitespace.

The case that \<newline> and \<space><newline> look the same and have drastically different behavior though is a strong point against the proposed change. In order to avoid that visual ambiguity, it would have to be that \<space> would escape that space from the \<newline> gobbling of whitespace, but still error when not terminating said gobbling. This complicates the feature and makes it less intuitive (“you can escape a space” vs special cases) to avoid a visual ambiguity.


Personal opinion.

I think that un-indenting the string and just including the literal representation that you want against the left margin is the best choice currently. This is likely due to in part my dislike for visual indenting/alignment; I prefer block indenting with tabs (but don’t care enough to argue the point with an established default style).


#12

Thanks for the summary! =)


#13

Seems like you could unambiguously have \| (or \>…) be an empty character (or only valid as a white space stripping terminator). So you could have

const NO_XTE: &str = "`xte` not found. make sure it's correctly installed:\n\
                    \|    pacman -S xautomation\n\
                    \|    apt-get install xautomation\n\
                    \|    <etc you get the idea>";

#14

Raw strings to the rescue!
You don’t even have to write \ns at line ends.

const NO_XTE: &str =
r"`xte` not found. make sure it's correctly installed:
    pacman -S xautomation
    apt-get install xautomation
    <etc you get the idea>";

#15

You don’t even need it to be a raw string.

    const NO_XTE: &str = "\
`xte` not found....
    pacman -S ...
    <etc you get the idea>";

Play


#16

Could \<space><newline> then become equivalent to \<newline>? E.g.

\ followed by a sequence of non-escaped whitespace characters until the end of the line behaves in the same way as \ followed by a newline immediately; in other words those non-escaped whitespace characters are discarded

So both of these

let s1 = "hello\␣
         \␣world!";
let s2 = "hello\
         \␣\
         world!";

would mean "hello world!"?


#17

We could also drop the whole newline stuff entirely (sorry everyone who currently uses it) and go for Lua: \z.


#18

I think the point was to avoid having to outdent the string.

How is this different from how it currently works? The whole problem is that \ skips the subsequent span of whitespace (just like \z) and they want to split that into a span that is skipped followed immediately by a span of whitespace which is not skipped. The two ways to do this are to have a delimiter (like \| in my previous comment), or to use trailing quote to indicate the indentation level, like with Swift multiline strings:

const NO_XTE: &str = """`xte` not found. make sure it's correctly installed:
                            pacman -S xautomation
                            apt-get install xautomation
                            <etc you get the idea>
                        """;

or

const NO_XTE: &str = """
    `xte` not found. make sure it's correctly installed:
        pacman -S xautomation
        apt-get install xautomation
        <etc you get the idea>
    """;

#19

It only makes unescaped newlines an error.