Pre-RFC: `String` literals through prefixes

I’ve been following the discussion in https://users.rust-lang.org/t/ergonomics-of-creating-string-s/850 about the ergonomics of creating String objects.

Current Situation

The current situation is that there are a few main ways to create String instances:

  • "foobar".to_string()
  • "foobar".to_owned()
  • "foobar".into()

…and some others by using constructors. Of these the last version probably has the best performance, but might need type annotations in some situations. .to_string() is the most obvious / easy to remember approach, but it invokes the formatting machinery and offers less performance. (Also, it’s not meant for creating String instances from 'static strings, which is confusing to newbies. OTOH it’s the way shown in most of the book and the tutorials.)

Another way to solve the issue is to create interfaces that take T: Into<String> as parameters, but that quickly results in hard to read interfaces, as Strings are usually used all over the place.

Proposal

Why not introduce a prefix to create String literals? In order to match byte literal syntax (b"foobar") a s"foobar" (for string) or a u"foobar" (for unicode, like in Python) prefix could be used.

Downsides

  • It would probably hide the cost of allocating a new String.
  • The byte literal creates a static reference, while the string literal would create a non-static owned object. (see also)
  • Newer code using the literals would probably be incompatible with previous Rust versions.

I would be happy about any comments :smiley: Maybe it’s a terrible idea (I only started doing Rust in January, so I don’t know much about the internals), but I hope the current situation can be improved somehow.

3 Likes

Plus side

  • The compiler can choose the best way for the new String. This should result in something like this s"New fancy String" == format!("New fancy String")

The issue really is, strings are usually not considered as composite values. They’re kind of numbers, but with complex allocation story. Sure, I’m talking mostly about usage story. In most cases, we use either static literal or produced, owned string. Which is in fact finely covered via Cow<'static, str>. Based on this, I’d better have two simple things:

  1. Type constraint alias, which resolves to something like IntoCow<'static, str>
  2. Alias for Cow<'static, str>

And spread using them for common string handling as good practice. This way, we won’t care in many cases whether particular string is a static literal or an owned string. And we’ll be able to pass both &str and String as function arguments, without syntactic clutter. This will not change existing API, but will influence future ones.

1 Like

It's shown because it is considered idiomatic, though I'll admit that String::from on literals is growing on me.

1 Like

Long ago, we had this in Rust: ~"string". It cause a ton of people to overallocate when they didn't need to because they'd just toss ~ onto things until things worked, rather than understanding what was going on. I know, I was one of them. :smile:

Heap allocation is explicit in Rust, and it's really nice. Making it syntactically more lightweight is a mis-step imho.

5 Likes

The following notation would help a lot, but I am sure not if the type system allows this.

let strings: Vec<String> = ["aa","bb","cc"].into();

I don’t think there would be any conflicts with current impls from an impl<T, U> From<Vec<U>> for Vec<T> where T: From<U>, and that does seem like a reasonable thing to have. You’d need to use vec!["aa", "bb"].into(), but that’s still not bad and adding impls involving arrays is annoying at the moment.

It does not: http://is.gd/NuWPXD

When "overloaded box" is implemented, could it be possible to do let s: String = box "hello";? If so, this would make it clear that an allocation is taking place. Combined with box patterns, this could also allow very ergonomic pattern matching on String values.

This might look weird at first to modern Rust programmers, but remember that ~5 became box 5, so it seems consistent that ~"hello" would become box "hello". Vec and String are smart pointers to [T] and str respectively (for example, they implement Deref), so constructing them should be consistent with constructing other smart pointers like Box<T> and Rc<T>.

9 Likes

Making heap allocations invisible is a bad idea in Rust. But what matters is that the allocation is visible. Having a heavy syntax for something frequently used is not a good idea.

That syntax looks acceptable.

I think the issue with a syntax like s"..." isn’t that its lightweight, but that its unique to strings and non-obvious that it means an allocation is being performed. Everyone would have to learn what that s meant.

On the other hand box is / will be more generally applicable and is clearly associated with allocations, making box "..." easier to pick up and easier to guess at if you don’t already know.

If box "..." is a string, though, box [...] should be a Vec<T>.

box is intended to infer the target type. E.g., the following would all work if the idea were accepted:

let s: String = box “string”; let b: Box<&'static str> = box “string”; let r: Rc<&'static str> = box “string”;

Boxing an array literal into a vector could also make sense.

4 Likes

I agree on this, because now the code written gets filled up with .to_string() until it compiles.

Since Strings are very common, I thing the s"Hello" Syntax would make the code more readable. As far as I know there is no Conflict with other types like in C++ (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf Page 686 - 21.7)

An other possible Solution would be to extend the Grammar with User-defined Literals (see C++14 2.13.8). To avoid Conflicts, these should only be allowed after a Statement, like this: "UserDefined IsoString"iso1 or "UserDefined Big5String"big5

[edit] Would one use use box as s to alias box to s "will alloc String"? Grammar

Yep, that's a good point.

I'm not a big fan of that syntax, mostly for asthetic reasons, but also because it adds more complexity when reading code that contains non-standard user defined literals.

The box syntax is very elegant indeed. It's obvious that there will be an allocation, but the syntax is much nicer than having to call a method on a string literal.

No, use declarations cannot rename keywords. What you want is a macro: s!("will be allocated as String"):

macro_rules! s(
    ($e:expr) => {{
        let s: &'static str = $e;
        String::from(s)
    }}
);

I had a Macro like this, but it messed up the code. So I changed it to format!("becoms real String") or "String".into()

I prefere the “overloaded box”

I question the necessity of a syntax sugar for heap allocated literals. To me this sounds like a major code smell.

Well designed code should avoid magic numbers (or any other literals). Rust has distinct types that denote a view into a container (slices for contiguous memory containers, str for Strings and general ranges and iterators) and these are are usually used in APIs to abstract over the container and so I don’t see a compelling use case to sweeten the syntax so much.

What’s so bad about "foo".into() or the other variants?

Edit: To clarify, I’d expect literals to usually be stack allocated and as such

let s: &`static str = "foo"; 

should be enough in most cases.

They're &'static, which means they can't be stack allocated. They're actually put in rodata, and the pointer to them is stack allocated.

Yeah, true. My phrasing should have been a little more careful :slight_smile:

My point still stands though.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.