Implement Add for String + String

A lint is certainly a possibility, though I have trouble imagining what rules it could apply. Part of the problem is that the by-value and by-reference versions will generally call different functions or use different impls, which are all user defined, so I suspect there will either be false positives or a non-exhaustive list of special cases (which could be challenging to create). Curiously, it seems there are no lints along these lines in clippy (the closest thing is replacing .map(|x| x.clone()) to .cloned() which is just a style issue), despite numerous lints targeting other kinds of ā€œworks but is not as fast as it could beā€.

I would love to see such lints regardless of whether String + String happens. As I said, Iā€™m not even sure if this concern outweighs the ergonomics increase, and there are plenty of other places where unnecessary clones can happen.

1 Like

Iā€™d be in favour of introducing a new operator for concatenation rather than extending +, and deprecating String + &str.

+ only makes sense to concatenate when used on types where Add doesnā€™t make any mathematical sense. If we allow those meanings to overlap, itā€™s bound to become a source of confusion. For some types, Add and Concatenate might be both valid but different operations, e.g. for numeric vectors. I also think Concatenate could interact well with type level integers when we implement them.

As for the operator, good candidates would be

  • ++ (Haskell, Erlang, Elixir, ā€¦)
  • <> (strings in Elixir)
  • ~ (D) (pro: single character ā€“ con: usual meaning is approximation)
  • are there others ?
9 Likes

To be very exact, strings form a well-defined mathematical object named a free monoid. So it is just a different mathematical sense :wink: Also,

Could it? Type level integers form a ring, just like an ordinary integer. I argue that the similarly-looking and similarly-meaning types should have the same set of operators and operations.

Here goes a complication with extending lexical tokens. I think that, as far as I know, introducing a new multi-letter token which doesn't contain any previously invalid letter breaks a backward compatibility with macros by examples (i.e. macro_rules! style):

macro_rules! t {
    (+ $x:tt) => ()
}

// used to compile, but no longer compiles with the addition of ++
t!(++);

Note that we already have similar problems with the treatment of >>, && and ||---they have to be split as the parser requires, and the macros cannot directly request the split. In that regard, a single-letter token is much better candidate than others.

This may help a lot. It seems that there are not much alternatives there (except that ~ is also used by Perl 6).

6 Likes

My opinion is that (a) an operator is convenient but not super important and (b) String + &str should be deprecated in Rust 2.0.

1 Like

This is Rust, not Haskell ;). We donā€™t have Monoid, SemiGroup, Functor or Monad traits and we, so far, havenā€™t stuck too close to category theory. Rust has always kept a low-level, practical approach to things so far. I see no reason not to continue.

As for backward compatibility with macros, Iā€™m not convinced the breakage would be extensive. We could measure that first. Plus there are steps we can take to reduce/prevent it. And macros are being redesigned anyway.

I know :smiley: What I meant is that the concatenation is quite different from the addition and your comparison seemed not aligned (e.g. type-level integers).

I agree that the turnout may be insignificant (breaking brainfuck macros would be unfortunate though). That said, not having to deal with the potential breakage (and to decide whether to ignore it or not) is a clear advantage. Also the compatibility breakage of deprecated features also counts as breakagesā€¦

1 Like

So I still want this. Hereā€™s the reason:

I have code that is generic between String and Cow<str> for the purpose of benchmarking the performance difference between them. I can run this code with each Rust version to see if either one had a performance regression (and in fact, there was one on Windows when Rust on Windows switched allocators).

The code in question looks like this:

//does the monoid operation on the slice of tuples if the closure evaluates to true
fn accumulate<'a, T: Monoid>(tuples: &[(&'a str, &Fn(i32) -> bool)], i: i32) -> T
		where T: From<&'a str> + From<String> {

	tuples.iter()
		.filter(apply(second, i))
		.map(first)
		.cloned()
		.map(<&str>::into)
		.fold1(T::op)
		.unwrap_or_else(|| i.to_string().into())
		//op just concatenates, but String does not satisfy Add
}

See, I had to actually make my own Monoid trait in order to call .fold1() with T::op

Another idea was to do this:

trait MaybeCollect: Iterator {
    fn maybe_collect<B>(self) -> Option<B> where B: FromIterator<Self::Item>;
}

impl<T> MaybeCollect for T where T: Iterator {
    fn maybe_collect<B>(self) -> Option<B> where B: FromIterator<Self::Item> {
        let mut iter = self.peekable();
        if iter.peek().is_none() {
            None
        } else {
            Some(iter.collect())
        }
    }
}

this works for String BUT my code is generic over String and Cow<str>

using .collect() on Cow<str> leads to allocations in the case where Cow<str> would just return a reference! Which means the performance benefits of using that type are eliminated.

I have to jump through extra hoops (making a new trait and having to use it to concatenate Strings) just because Add is not implemented on String already

3 Likes

Another concatenation operator:

Not that I would generally suggest Lua as a language for Rust to take design principles from :slight_smile:.

1 Like

If we talk about a concatenation operator instead of +, what about &? It has some precedent (Ada and Visual Basic), the symbol is kinda intuitiv and we donā€™t need a new token.

Of course itā€™s still overloading, but there is no conceptual overlap between types that can be bitwise-ored and types that can be concatenated.

Drawback: a & &b looks funny.

It makes it easy to make code with astonishingly bad performance. cf Java string1 + string2 + string3 vs StringBuilder.

Nudging people to consider how to build their strings in linear time and constant space where possible is best. Maybe this means nudging them to use format! where possible, and some lower level API when they need to really squeeze our cycles.

FWIW, I think format! looks just as nice, or nicer than +.

Looks like you want a string builder structure here and .collect at the end. But if you're trying to flex the allocations then it doesn't make sense to choose I good algorithm, I guess. Like when people test recursive fibonacci sequence generators to benchmark things.

2 Likes

String in Rust is the equivalent of Java's StringBuilder, a Java String is sort of like a &'static str.

Specifically impl Add<String> for String can reuse the buffer allocated for the self parameter, whereas format! will actually cause a new String to be allocated and may result in lower performance.

The only way that this is less performant than the existing impl Add<&str> for String is that you're discarding the right-hand side String, in cases where you're using this instead of just doing str1 + &str2 it's likely that you're not going to be reusing that string anyway so that doesn't matter. As mentioned earlier this could incite more .clone abuse for newbies that don't realise they can add a reference into a string instead of cloning the RHS, but that's something that can be linted against (and hopefully with a generic lint to catch other libraries types that follow a similar pattern).


EDIT: Writing this I just realised there's one more non-allocating variant that hasn't been mentioned (that I can remember) impl Add<String> for &str. That can be implemented via prepending the bytes from the &str to the String buffer, reallocating if necessary, then returning the String. I'm not sure if there's any realistic usecases but it would allow for something like

fn prefix(msg: String) -> String { "it says: " + msg }
1 Like

Upon inspection of the openjdk code for StringBuilder I see that you're absolutely right. Java's StringBuilder is basically like Rust's std::String. I thought it was more like .Net 4.0's StrinbBuilder which uses a linked list to store the individual strings and only builds the resulting string when toString is called. This reduces the number of memory copies.

You mean like leftpad? :laughing:

+1 to concatenation operator.

But I also support the one-new-operator to end-all-custom-operators, the infix call operator \.

No new operators needed, a library can just provide binary functions.

extern crate monoid;
use monoid::append;

let v = vec![1, 2, 3] \append [1, 2, 3];
let s = "abc" \append "def" \append "ghi";
  • Functions already support: namespacing, importing and renaming.
  • Functions can also use traits that they defined for the operands, so we get a lot of flexibility by reusing an abstraction we already have (functions are great!).
  • Free functions can use a conversion trait bound on both arguments, where methods can not do that for the self argument.
3 Likes

As someone relatively new to rust:

I personally think the ship has already sailed; + is already in use and part of the language.

Not only that, rustā€™s roadmap for 2017 is focusing on the learning curve of rust, and given that C#, C++, CoffeeScript, E, Eiffel, F#, Java, JavaScript, merd, MSH, Pascal, Pike, Pliant, Python, Ruby, YCP already use ā€˜+ā€™ as @lifthrasiir pointed out earlier, I think a compelling argument in favour of consistency and familiarity can be made.

Learning rust has been a thousand paper cuts for me, and this would be one less paper cut.

4 Likes

As someone else also relatively new to Rust:

It seems to me that there are two issues here that can, perhaps, be logically separated and arbitrated independently:

  1. Use of a concat-specific operator.
  2. What to do about resource management in concat (and other) such operations (cloning, referencing, etc.).

For my part, I really like the idea of a concat-specific operator that is not +. In particular, I think ++ seems ideal. As has already been stated, a concat-specific operator different than + (Add) would generalize to other, e.g., collection, types and would be nice for any types where there is a logical case for, and distinction between, both + and ++. In this context, I would support deprecating String + &str but thereā€™s a whole language cruft/non-ideality fix vs. non-breaking change discussion to be had there. That one is a bear and one can certainly get good arguments from both sides; probably depends on slippery timing and magnitude concepts as well.

1 Like

Well, that's as may be, but, of the ones I'm familiar with, every single one of the languages you listed also allows mutable aliasing. They also allow you to treat strings as linear arrays of characters. If Rust only did what was familiar and expected, it wouldn't have a reason to exist in the first place. Also, consider that using + for string concat in at least C#, C++, and Java (and probably others) is bad practice.

2 Likes

I'm not sure what you mean in this context; in Python, strings are immutable. Can you explain?

But we're not talking about the general case; we're talking about a very specific case. I'd prefer to avoid over-generalization.

That's subjective and dependent on context.

I wanted to quote this in full with a big :+1:

String + String can actually be better-performing than String + &str if the right-hand-side has a big enough buffer already allocated, for example. I really don't like the "We shouldn't add the impl because people might .clone() unnecessarily" argument. They could a + &b.to_owned() too. But that's a very visible thing, so I don't see an issueā€”that's why Rust requires explicit copying in the first place. If anything, I think it's clearer and provides more information to LLVM to have the RHS be moved into a function/operator when I've done with it. (I sometimes wish I could move a T into something wanting a &T, since I'm done with it and don't care whether it's fully moved or just borrowed, so long as I get "use of moved value" if I try to use it again.)

Such lints would also be great for things other than Strings. It sounds lovely to finish refactoring some code and have a lint pop up with "You can just borrow that BigInteger in the multiply now" or "You no longer need to clone this Arc when calling this function, since it's now the last use of the variable".

4 Likes

Actually some Java compilers have additional optimizations for such cases. s = string1 + string2 + string3; would usually compile to something like

StringBuilder sb = new StringBuilder(); 
s = sb.append(string1).append(string2).append(string3).toString();

Unfortunately it doesn't work with loops, so explicit StringBuilder is often usefull.

1 Like

The + operator could also be the operator for two traits:

  • Add, just like right now
  • Concatenate offering all kinds of concatenation

The advantage of this is that functions taking Add donā€™t take sequences, since Add is semantically different from Concatenate. Just because both have the same operator, doesnā€™t mean both are the same action.

I think we can all agree that ā€œaddingā€ strings mathematically makes no sense, but concatenating strings makes totally sense. At the same time, concatenating integers is not an operation we all can agree on what it should do, but adding integers is uncontroversial to best of my knowledge.

1 Like