[1st April joke] [pre-RFC] Improving the ergonomics of creating owned string objects


#1

Summary

This RFC proposes a syntax for creating owned strings from string literals, focusing on eliminating visual noise and reusing familiar concepts.

Motivation

The difference between a string literal (&'static str) and an owned String object is a common source of confusion for new users of the language, and one that experienced users of the language are mostly resigned to. For code like "string literal".to_string(), the most important detail is the string’s content rather than how it’s stored, yet the visual noise of the method invocation obscures this.

Detailed design

The syntax of string literals is common through the majority of programming languages - a quote character ("), followed by the contents of the string, followed by a terminating quote character ("). We can build on this familiarity while taking advantage of advances in text editing capabilities by introducing the notion of a quoted string - an open quote character (), followed by the string contents, followed by the closing quote character (). The type of such a quoted string expression is String.

To avoid confusion in terminology, strings surrounded by the old style of quote characters should be referred to as quasi-quoted strings.

It is an error to use two open quote characters or two closing quote characters for a quoted string, but there will be no need to include specific diagnostics for this case. Any such error will be reported by the compiler as an unterminated string.

How we teach this

We can encourage new Rustaceans to write their code using their favourite word processing software (eg. Microsoft Word, LibreOffice, etc.), since they come with full quoting abilities already enabled by default. As users gain proficiency in the language, they can be encouraged to copy and paste quoted strings from their word processor into their preferred text editor.

Drawbacks

For those that refuse to use a modern text editing environment, there may be some pushback until we can provide a list of recommended packages that will make producing quoted strings as straightforward as quasi-quoted strings.

There may be some fonts which do not distinguish between opening and closing quote characters. This would fall under the “Doctor, it hurts when I do this” category of drawbacks.

Alternatives

Previous pre-RFCs have suggested ergonomics improvements through prefixes; one could also envision suffixes, or even midfixes for this task - for example, _ is allowed as a arbitrary separator inside numeric literals (1_000). We could re-purpose it as a familiar yet meaningful separator for owned string literals ("string "_"literal").

We could also stick with the status quo, filling our programs with a forest of .into(), .to_string(), and .to_owned() in an attempt to forget the future that could have been.

Unresolved questions

None at present.


[pre-RFC] Allowing string literals to be either &'static str or String, similar to numeric literals
#2

Can’t the compiler infer where a string starts/ends without quotes?

Or perhaps, we could switch from UTF8 to docx encoding for all Rust files, then we could define some nice format template to tell the compiler something is an owned string or even some other builtin type – something with nice color perhaps? I think that would really distinguish Rust from other languages. We could call it Visual Rust (if that’s ok for Microsoft).


#3

I’m sure in Perl5 we can find many other similar nice ideas. The possibilities are א.


#4

Using docx could solve grammatical ambiguities too! e.g. No more contextual keywords – just apply the Keyword or Identifier style to disambiguate it. Of course, for ergonomics the editor should infer the style for you most of the time.


#5

Please also support „German quotation marks“ as I like to write translated error messages to make my otherwise English code easier to read!


#6

is not visually distinct enough. I propose to use Japanese quotes instead 「ストリング・リテラル」。


#7

Note that Japanese uses 「」『』《》 and so on depending on how stylish the author is, so we can even throw Box<[u8]> into this proposal.


#8

Having to very similar looking is very bad for ergonomics and will lead to much frustration for new users. This is made worse by the fact that the compiler is enabled to infer which type of string is required. Yet is has been shown that explicitness is better than implicit magic, so we need something distinguishable on the first look.

There I want to propose explicitly marking Strings with zero-width spaces (​). This solution allows writing code that is visually compact and yet unique. Additionally, it helps rustfmt and similar tools to find the right place for breaking your code. In the future, we could further improve this by using combining characters, eg. `̋hello world༏.


#10

Is there a joke tag we can apply now that the day has passed?


#11

If one can’t tell it’s a joke, does one deserve to know?


#12

Computer science people are sometimes smart, but not very good at spotting jokes. And this can lead to troubles for them :slight_smile:


#13

I have added a simple tag in the thead title…


#14

There’s actually something a bit awesome about this, since it allows nesting like we do for /* */ comments:

let x =  “‘“A Traveller” had visited the monastery in Snagov in 1605. He had talked a good
          deal with the monks there […] The epitaph, which I copied down with care—out of
          what instinct I didn't know—was in Latin.’ Hugh dropped his voice, glanced behind
          him, and stubbed out his cigarette in the ashtray on our table.”

No need for r#"…"# :laughing:


#15

No need for r#"…"#

Unless you want your string to include a ” without the matching “ :wink:

Even though this is a joke, I actually don’t think that using non-ASCII characters for syntax is a terrible idea, as long as there is an equivalent way to express it using ASCII characters, and especially if there is a straighforward way for editors to convert an ASCII sequence to the special charcter. But I think it would make more sense for “” to be used as special syntax for raw strings (and editors could do something like expand r#" to “” with the cursor inside the quote when it is typed).


#16

If you use Microsoft Windows, you don’t need Microsoft Word to conveniently use smart quotes. Instead you can use the US - Brian Keyboard:

AltGr+( inserts ‘ AltGr+) inserts ’ AltGr+[ inserts “ AltGr+] inserts ” AltGr+’ inserts ʻ, the ʻokina (a letter of the Hawaiian language that appears in words such as Hawaiʻi and Oʻahu). AltGr+2 inserts the degree symbol °, which is very useful if you are a resident of Hawaiʻi, as you will be talking about temperature a lot with non-residents.

I literally do all my Rust programming using the US - Brian keyboard.

Enjoy.


#17

Colemak also has this.