Pre-RFC: `String` literals through prefixes

dbrgn · November 21, 2015, 9:45pm

I’ve been following the discussion in https://users.rust-lang.org/t/ergonomics-of-creating-string-s/850 about the ergonomics of creating String objects.

Current Situation

The current situation is that there are a few main ways to create String instances:

"foobar".to_string()
"foobar".to_owned()
"foobar".into()

…and some others by using constructors. Of these the last version probably has the best performance, but might need type annotations in some situations. .to_string() is the most obvious / easy to remember approach, but it invokes the formatting machinery and offers less performance. (Also, it’s not meant for creating String instances from 'static strings, which is confusing to newbies. OTOH it’s the way shown in most of the book and the tutorials.)

Another way to solve the issue is to create interfaces that take T: Into<String> as parameters, but that quickly results in hard to read interfaces, as Strings are usually used all over the place.

Proposal

Why not introduce a prefix to create String literals? In order to match byte literal syntax (b"foobar") a s"foobar" (for string) or a u"foobar" (for unicode, like in Python) prefix could be used.

Downsides

It would probably hide the cost of allocating a new String.
The byte literal creates a static reference, while the string literal would create a non-static owned object. (see also)
Newer code using the literals would probably be incompatible with previous Rust versions.

I would be happy about any comments Maybe it’s a terrible idea (I only started doing Rust in January, so I don’t know much about the internals), but I hope the current situation can be improved somehow.

dns2utf8 · November 21, 2015, 9:55pm

Plus side

The compiler can choose the best way for the new String. This should result in something like this s"New fancy String" == format!("New fancy String")

target_san · November 21, 2015, 10:56pm

The issue really is, strings are usually not considered as composite values. They’re kind of numbers, but with complex allocation story. Sure, I’m talking mostly about usage story. In most cases, we use either static literal or produced, owned string. Which is in fact finely covered via Cow<'static, str>. Based on this, I’d better have two simple things:

Type constraint alias, which resolves to something like IntoCow<'static, str>
Alias for Cow<'static, str>

And spread using them for common string handling as good practice. This way, we won’t care in many cases whether particular string is a static literal or an owned string. And we’ll be able to pass both &str and String as function arguments, without syntactic clutter. This will not change existing API, but will influence future ones.

steveklabnik · November 21, 2015, 11:41pm

It's shown because it is considered idiomatic, though I'll admit that String::from on literals is growing on me.

steveklabnik · November 21, 2015, 11:43pm

Long ago, we had this in Rust: ~"string". It cause a ton of people to overallocate when they didn't need to because they'd just toss ~ onto things until things worked, rather than understanding what was going on. I know, I was one of them.

Heap allocation is explicit in Rust, and it's really nice. Making it syntactically more lightweight is a mis-step imho.

nielsle · November 23, 2015, 8:51am

The following notation would help a lot, but I am sure not if the type system allows this.

let strings: Vec<String> = ["aa","bb","cc"].into();

wthrowe · November 24, 2015, 1:41am

I don’t think there would be any conflicts with current impls from an impl<T, U> From<Vec<U>> for Vec<T> where T: From<U>, and that does seem like a reasonable thing to have. You’d need to use vec!["aa", "bb"].into(), but that’s still not bad and adding impls involving arrays is annoying at the moment.

Nashenas88 · November 28, 2015, 4:35pm

It does not: http://is.gd/NuWPXD

mbrubeck · November 28, 2015, 10:41pm

When "overloaded box" is implemented, could it be possible to do let s: String = box "hello";? If so, this would make it clear that an allocation is taking place. Combined with box patterns, this could also allow very ergonomic pattern matching on String values.

This might look weird at first to modern Rust programmers, but remember that ~5 became box 5, so it seems consistent that ~"hello" would become box "hello". Vec and String are smart pointers to [T] and str respectively (for example, they implement Deref), so constructing them should be consistent with constructing other smart pointers like Box<T> and Rc<T>.

leonardo · November 29, 2015, 12:25pm

Making heap allocations invisible is a bad idea in Rust. But what matters is that the allocation is visible. Having a heavy syntax for something frequently used is not a good idea.

That syntax looks acceptable.

withoutboats · November 29, 2015, 6:30pm

I think the issue with a syntax like s"..." isn’t that its lightweight, but that its unique to strings and non-obvious that it means an allocation is being performed. Everyone would have to learn what that s meant.

On the other hand box is / will be more generally applicable and is clearly associated with allocations, making box "..." easier to pick up and easier to guess at if you don’t already know.

If box "..." is a string, though, box [...] should be a Vec<T>.

rkjnsn · November 29, 2015, 10:33pm

box is intended to infer the target type. E.g., the following would all work if the idea were accepted:

let s: String = box “string”; let b: Box<&'static str> = box “string”; let r: Rc<&'static str> = box “string”;

Boxing an array literal into a vector could also make sense.

dns2utf8 · November 30, 2015, 10:15am

I agree on this, because now the code written gets filled up with .to_string() until it compiles.

Since Strings are very common, I thing the s"Hello" Syntax would make the code more readable. As far as I know there is no Conflict with other types like in C++ (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf Page 686 - 21.7)

An other possible Solution would be to extend the Grammar with User-defined Literals (see C++14 2.13.8). To avoid Conflicts, these should only be allowed after a Statement, like this: "UserDefined IsoString"iso1 or "UserDefined Big5String"big5

[edit] Would one use use box as s to alias box to s "will alloc String"? Grammar

dbrgn · November 30, 2015, 10:43am

Yep, that's a good point.

I'm not a big fan of that syntax, mostly for asthetic reasons, but also because it adds more complexity when reading code that contains non-standard user defined literals.

The box syntax is very elegant indeed. It's obvious that there will be an allocation, but the syntax is much nicer than having to call a method on a string literal.

oli-obk · November 30, 2015, 12:05pm

No, use declarations cannot rename keywords. What you want is a macro: s!("will be allocated as String"):

macro_rules! s(
    ($e:expr) => {{
        let s: &'static str = $e;
        String::from(s)
    }}
);

dns2utf8 · November 30, 2015, 8:47pm

I had a Macro like this, but it messed up the code. So I changed it to format!("becoms real String") or "String".into()

I prefere the “overloaded box”

yigal100 · December 2, 2015, 3:51pm

I question the necessity of a syntax sugar for heap allocated literals. To me this sounds like a major code smell.

Well designed code should avoid magic numbers (or any other literals). Rust has distinct types that denote a view into a container (slices for contiguous memory containers, str for Strings and general ranges and iterators) and these are are usually used in APIs to abstract over the container and so I don’t see a compelling use case to sweeten the syntax so much.

What’s so bad about "foo".into() or the other variants?

Edit: To clarify, I’d expect literals to usually be stack allocated and as such

let s: &`static str = "foo";

should be enough in most cases.

steveklabnik · December 2, 2015, 7:31pm

They're &'static, which means they can't be stack allocated. They're actually put in rodata, and the pointer to them is stack allocated.

yigal100 · December 3, 2015, 8:28pm

Yeah, true. My phrasing should have been a little more careful

My point still stands though.

system · March 25, 2019, 8:25am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[1st April joke] [pre-RFC] Improving the ergonomics of creating owned string objects language design	16	3161	March 25, 2019
[pre-RFC] Allowing string literals to be either &'static str or String, similar to numeric literals	39	10771	March 25, 2019
[pre-RFC] custom string literals language design	7	4006	March 25, 2019
String literal applied to owned type (String) and boxed Any language design	19	964	October 25, 2023
String::from("...") => d"..." language design	27	2413	March 25, 2019

Pre-RFC: `String` literals through prefixes

Current Situation

Proposal

Downsides

Plus side

Related topics