Implement Add for String + String

Also, I’m +1 for better error messages for "str1" + "str2" (fails with an impl Add is missing for &'static str) and for "str1".to_owned()+"str2".to_owned() (fails with mistmatched types) with some more user-friendly messages to avoid newcommers shock about concatenation.

And I’m -1 for adding String + String and &str + &str impls.

++ has a lot of appeal as a concatenation operator.

6 Likes

The Add operator is already used for some sequences. For example ndarray, rulinalg, nalgebra use them for array/vector/vector addition. For the dynamic length cases it does make sense for them to define distinct concat and addition operators.

I guess the conversation is going more towards new operators for concatenation. But I wanted to take a second look at a StringBuilder as I think it might be more appropriate in Rust than simple string concatenation. I whipped together a simple (and possibly buggy) microbenchmark for people to think about. Here it is.

test format_string_building_lorum_noloop           ... bench:         280 ns/iter (+/- 35)
test format_string_building_short_noloop           ... bench:         214 ns/iter (+/- 53)
test naive_string_building_lorum_loop              ... bench:         140 ns/iter (+/- 19)
test naive_string_building_lorum_loop_many         ... bench:     655,227 ns/iter (+/- 184,016)
test naive_string_building_lorum_noloop            ... bench:         132 ns/iter (+/- 29)
test naive_string_building_short_loop              ... bench:         132 ns/iter (+/- 78)
test naive_string_building_short_loop_many         ... bench:      28,471 ns/iter (+/- 9,805)
test naive_string_building_short_noloop            ... bench:         117 ns/iter (+/- 50)
test stringbuilder_string_building_lorum_loop      ... bench:          82 ns/iter (+/- 19)
test stringbuilder_string_building_lorum_loop_many ... bench:     427,720 ns/iter (+/- 40,376)
test stringbuilder_string_building_short_loop      ... bench:          57 ns/iter (+/- 13)
test stringbuilder_string_building_short_loop_many ... bench:      37,625 ns/iter (+/- 3,937)

An explicit string builder seems to be better except in the case where there are lots of small strings. Where there are lots of small strings, the capacity extending of String easily accommodates the concatenations without requiring new copies.

In any event, I think having StringBuilder in std could:

  • Guide people to use it (Q: How do I build a larger string from smaller strings!? A: oh, maybe this “StringBuilder” is the place to look…").

  • Be familiar to java and .net people.

  • In some cases, it’s faster.

  • Offers some nice opportunities for working with iterators. e.g. collection_of_strings.into_iter().filter(no_russian).collect::<StringBuilder>().to_string(); And that’s kind of what @iopq wanted to do in the first place.

I wanted to take a second look at a StringBuilder as I think it might be more appropriate in Rust than simple string concatenation

So I agree with the instinct that something StringBuilder-ish would be more appropriate in Rust, but there's already concat! and the concat and join methods on slices. Maybe we just need to make it easier to find those and know how to use them? For instance,

collection_of_strings.into_iter().filter(no_russian)
    .collect::<StringBuilder>().to_string()

translates directly to

collection_of_strings.into_iter().filter(no_russian)
    .collect::<Vec(String)>().concat()

and we could easily add some shorthand for that:

let foo : String = collection_of_strings.into_iter().filter(no_russian).collect();

and maybe even

let foo = collection_of_strings.into_iter().filter(no_russian).join(", ");

I think that would usually be more ergonomic than StringBuilder.

(Semi-tangential rant: The SliceConcatExt documentation makes it sound like everything on that page, including the methods, is unstable, but if you read issue #27747 you discover that the methods are stable, but the existence of the trait is considered to be an inappropriately exposed implementation detail and instability has been used to discourage people from referring to it directly. This needs to be fixed. Giving the wrong impression unless you already know what's going on is textbook Bad Documentation.)

4 Likes

Yes! :thumbsup: Maybe even just aliasing them as StringConcatExt ans re-exporting it under std::string::...

[quote]let foo = collection_of_strings.into_iter().filter(no_russian).join(", "); [/quote] That helps in the case of collecting a bunch of strings using iterators. But if someone is looking to build a string they might not think to call iter and then look for various extension traits for what they want. When I search for string in the docs, I don't see anything about SliceConcatExt. I had originally meant that a StringBuilder could also, as a nice to have feature, work with iterators. But that's gravy after it's been discovered.

I wrote:

I thought I'd look into making this work, and I discovered that it already works. There are FromIterator<T> for String impls for the obvious string classes.

fn main() {
  let v = vec!["foo", "bar", "baz", "quux"];
  let s : String = v.into_iter().filter(|s| s.starts_with("b")).collect();
  println!("{}", s);
}

compiles and runs and prints "barbaz", as expected. So we just need join-with-separator now.

1 Like

@bluss has this in his rust-itertools crate.

https://docs.rs/itertools/0.5.9/itertools/fn.join.html

use itertools::join;

assert_eq!(join(&[1, 2, 3], ", "), "1, 2, 3");

But it’s not in std.

1 Like

Does it have a chance to go into libstd? It’s a bit fuzzier than std’s usual since it accepts anything that implements Display. (And Iterator is in libcore, so it can’t be an iterator method).

What is the conclusion here ? AFAIU String + &str is going to be deprecated in the future and other concatenation options such as String + String or &str + String will be never implemented. Is it correct ?

I see no indication or intent of String + &str being deprecated.

4 Likes

And the other options will not be implemented , right ? Personally I don’t see any issue with String + String, &str + String. This would definitely make newcomers happier and there is no performance degradation here unless you write explicitly clone() or anything like that.

My opinion now, in contrast to two and a half years ago, is that we should add this impl to make things easier for people.

11 Likes

@withoutboats I think we already have our cake (have obvious allocations around String/&str) and eat it too (be friendly to newcomers):

error[E0369]: binary operation `+` cannot be applied to type `&str`
 --> src/main.rs:2:13
  |
2 |     let _ = "" + "";
  |             ^^^^^^^ `+` can't be used to concatenate two `&str` strings
help: `to_owned()` can be used to create an owned `String` from a string reference. String concatenation appends the string on the right to the string on the left and may require reallocation. This requires ownership of the string on the left
  |
2 |     let _ = "".to_owned() + "";
  |             ^^^^^^^^^^^^^

error[E0308]: mismatched types
 --> src/main.rs:3:29
  |
3 |     let _ = String::new() + String::new();
  |                             ^^^^^^^^^^^^^
  |                             |
  |                             expected &str, found struct `std::string::String`
  |                             help: consider borrowing here: `&String::new()`
  |
  = note: expected type `&str`
             found type `std::string::String`

error[E0369]: binary operation `+` cannot be applied to type `&str`
 --> src/main.rs:4:13
  |
4 |     let _ = "" + String::new();
  |             ^^^^^^^^^^^^^^^^^^ `+` can't be used to concatenate a `&str` with a `String`
help: `to_owned()` can be used to create an owned `String` from a string reference. String concatenation appends the string on the right to the string on the left and may require reallocation. This requires ownership of the string on the left
  |
4 |     let _ = "".to_owned() + &String::new();
  |             ^^^^^^^^^^^^^   ^^^^^^^^^^^^^^

It explains the reason for this not being supported while giving the user a clear avenue to fix it. It simultaneously helps get over the bump and expands the understanding of the underlying language. I think changing the ergonomics of the language by adding Add<String> to the language would be worth it only if we could simultaneously introduce lints to detect unnecessary allocations.

4 Likes

I'm not convinced. The a.to_owned() + b recommendation is already an unnecessary allocation (compared to, say, [a, b].join("")).

Supporting a + b can even reduce allocations, because if it's changed to a + &b and b actually had enough capacity for the combined string, it's allocating for no reason.

4 Likes

We could have impl Add<String> for &'a str or similar to handle that case. The other two cases (&str + &str and maybe String + String) I think it is reasonable to raise rustc's hands in the air and say "you deal with it, what did you really want?".

FWIW, C++ has all four overloads of operator + (analogous to &str + &str/&str + String/String + &str/String + String) to manage allocations optimally in each case.

1 Like

String + String would not allocate. There would still be an obvious allocation if a user cloned the second string instead of referencing it.

Doesn’t a.to_owned() + b reuse the buffer of the temporary a.to_owned() by reusing its buffer? In that case, how is it an unnecessary allocation?

Because a.to_owned() doesn't have any excess capacity, so it's one allocation for the to_owned, and then another when reserving enough space for the addition.

Whereas [a, b].join("") is one allocation, since it pre-computes the capacity.

3 Likes