Getting rid of String slices for better ergonomy

I believe this is the PR in question: [WIP] `str` is dead, long live `str([u8])`! by japaric · Pull Request #19612 · rust-lang/rust · GitHub

2 Likes

Could something like Cow<'a,str> be the default string type?

What would be the problems of making something like Cow<'a,str> the only string type? Aside from backwards compatibility concerns, I mean.

1 Like

How would it work with no_std where no heap allocator is available for creating a copy?

1 Like

It would not work,I had not thought about no_std.

There doesn’t seem to be a way to have all of these at the same time then:

  1. A string type accessible in no_std contexts.

  2. A view into another string.

  3. A mutable string type.

  4. Having a single string type.

I do not think sacrificing 1 or 2 for 4 would be a good idea.

I do think that having a unique string type, something like type Str = Cow<'a, str>; would be nice, even if it’s unavailable in no_std.

The problem is that this doesn't work as written, it would have to be Str<'a>, and therefore in every struct you currently have an owned String, you introduce a lifetime.

3 Likes

For some historical context: In very early versions of Rust, the equivalent of String was ~str, the equivalent of Vec<T> was ~[T], and for non-slice types, the equivalent of Box<T> was ~T.

Value Borrowed pointer Owned pointer
u8 &u8 ~u8
str &str ~str
[T] &[T] ~[T]

This made the type names more consistent, but their behavior less consistent. For example ~str and ~[T] were growable, just like modern String and Vec, so they had special built-in methods that were not available on their borrowed equivalents. They were also 3 pointer-widths in size, which was inconsistent with other ~T types. And there could be no equivalent of modern Box<str> or Box<[T]>.

Today we have have reduced the number of special built-in pointers, and added library types in their place:

Slice Borrowed pointer Owned pointer Growable buffer
str &str Box<str> String
[T] &[T] Box<[T]> Vec<T>
Path &Path Box<Path> PathBuf
OsStr &OsStr Box<OsStr> OsString
…etc

This reduces language complexity but moves the same complexity into libraries, and it means fewer sigils to remember but more names.

These library types may become more understandable if you consider them all “smart pointer” types. For example String implements Deref<Target=str>, so when you access its contents the relevant type is still str.

11 Likes

If we wanted to make the naming more consistent, perhaps there could be a generic “growable buffer” type, such that String would be renamed to Buf<str> and Vec<T> to Buf<[T]>, etc.:

Slice Borrowed pointer Owned pointer Growable buffer
str &str Box<str> Buf<str>
[T] &[T] Box<[T]> Buf<[T]>
Path &Path Box<Path> Buf<Path>
OsStr &OsStr Box<OsStr> Buf<OsStr>
…etc

But note that a great many of the methods and traits on Buf<T> would vary based on T, so programmers would still need to learn the specifics of individual types like Buf<str> – just not a new name.

7 Likes

Nowadays we also have Rc<str> and Arc<str> as well as Rc<[T]> and Arc<[T]>.

1 Like

For another point of comparison, modern C++ has std::string (similar to Rust String) and std::string_view (similar to Rust &str), as well as various other specialized string types.

2 Likes

I’m actually a bit skeptic of the claim for better ergonomy here. Yes, the current String vs str disparity has a clear learning curve to it, but I don’t actually find it very hard in practice. (Even if there are still some coercion cliffs in places, but those can just be ironed out over time.)

6 Likes

If you know it's owned, you can use Str<'static>.

I don’t see anything more than a very marginal ergonomic/learning benefit from any of the proposed alternatives. Sure, they’re a bit more streamlined and consistent, but “perfect is the enemy of good.” Beginners will still have to learn what the different terms mean. And they’ll have to learn the reasons that multiple string types exist. (And why that’s the case in almost every language.) That’s unavoidable. Whether you have a convention that “buf” means “growable” or not is largely immaterial. You’ll still be having to teach people when to use StrBuf over OsStrBuf or VecBuf, etc.

And at the end of the day, nobody’s code will even run better. We’d just have a different color bikeshed.

4 Likes

That's quite a claim. I can think of a lot of people that would take the opposite position. Different things should be different.

1 Like

I don’t think it’s totally clear. Someone might be wondering what the relationship between StringBuf and BufReader is, for instance.

If an engineering team decides the build a bridge with steel trusses, it is a given that they need to be painted. And there are lots of different reasons to paint it in different colors. Some colors are better looking, some are safer, some are traditional, some are more memorable. And there’s a lot of worthwhile thought that can go into making that sort of decision. Where it goes off the rails is when the paint-selection commitee goes to the engineering team and suggests the steel beams be substituted for wooden ones.

In that case, we’re talking about breaking Rust’s stability guarantees or performance focus. And I’d argue adding superfluous types to std is similarly questionable.

Cow is probably not as nice as you expect.

It’s a necessary evil for optionally returning owned data, but it has all the downsides of both types it represents. It has a lifetime attached causing borrow checker woes as bad as &str, and it’s larger than String and causes double indirection if you pass it by reference.

Cow<'static> is not any better either. It is a waste of space for every single string in the program to just share string literals, but still much less efficiently than &str.

There was a proposal to hack 0-capacity String to mean it’s a string literal and the data pointer shouldn’t be freed.

5 Likes

I have nearly a decade of experience, and that experience ranges as many languages as years. That includes C, C++, PLCs, C#, Java, F#, Elixir, and more. I understand programming on a deep level. The explicit ownership model of Rust does not baffle me. I have ergonomic issues with strings in Rust.

Please do not dismiss users telling you there are problems with the lack of ergonomics here. We’re not inexperienced folks who just need to learn the concepts. We’re veteran software engineers who are providing feedback about the current APIs.

4 Likes

I was not trying to dismiss anything here (and I’m a bit surprised that you would read that into my earlier comment). I’m merely trying to puzzle apart if there is a difference between learnability and usability here, mostly because, as I was saying, I did not find the the String vs str disparity very hard relative to other parts of the language.

I’m curious if those who find String vs str confusing (other than for just the names) also have trouble with Vec vs slice. To me there’s such an obvious parallel here that I find it quite straightforward to use in practice (modulo maybe rough edges like Index into str being on the underlying u8 storage).

3 Likes

My apologies, but it’s not the first I’ve heard someone from the community be dismissive about the ergonomic issues with Strings. The Rust-Lang Twitter admin has even done it to me personally. Anyway, moving on.

I don’t find the types themselves confusing. It makes perfect sense once you’ve got the idea of what slices are. Except maybe, as for why there’s a special type for string slices.

I find there’s an actual usability issue with strings in Rust. For a high level language, I spend 2-4x the amount of effort doing any string processing than I would in other languages, with the exception of C and maybe C++. I know that the language is designed to be very explicit about when you’re allocating a String from a str, but the reverse should be implicit IMO. I find myself consistently surprised, although I shouldn’t be at this point, when the type checker throws a tantrum when trying to pass a String into a function that takes a str.

Can you elaborate? str is a DST, so functions cannot take a raw str. Do you mean passing a String to something that takes a &str? Because that's a different thing than anything about strings -- you can't pass an i32 to something that takes &i32 either.

2 Likes