A little proposal for string concatenation

Recently, I work on text processing heavily, and in many cases, I need to concatenate strings like: let mut my_string=String::new(); my_string.push_str(“aaa”); my_string.push_str(“bbb”); my_string.push_str(“ccc”); … But this way seems kind of stupid, so, a more convenient one could be added, like: fn push_strs(&mut self, strs: &[&str]) thus, the case above can be simplified to: my_string.push_strs(&[“aaa”, “bbb”, “ccc”, …]);

I don’t know if this is the right place to post the topic, but I’d like to get some positive response.

Why add what already exists?

fn main() {
    let my_string = ["aaa", "bbb", "ccc"].concat();
    assert_eq!(my_string, "aaabbbccc");
    
    let my_string = ["aaa", "bbb", "ccc"].into_iter().cloned().collect::<String>();
    assert_eq!(my_string, "aaabbbccc");
    
    let mut my_string = String::new();
    my_string.extend(["aaa", "bbb", "ccc"].into_iter().cloned());
    assert_eq!(my_string, "aaabbbccc");
}

What’s more, the last two work for everything that’s iterable, not just slices.

Firstly, sorry, there is a little misleading here, actually what I mean is for &str variables, but not literals that already exist at compiling time.

Secondly, I know these ways you mentioned, BUT all of them require memory allocation intermediately which is dramatically impacts the performance, paticularly for a large amounts of text. I think avoiding the memory allocation is very helpful for this kind of work.

Every one of those should work for &str variables just as much as string literals. None of them involve memory allocation aside from the resulting String.

I’m afraid the clone() method does result in memory allocation. I test these methods: fn main() { let t1=std::time::Instant::now(); let mut my_string=String::new(); let a=“aaa”; let b=“bbb”; let c=“ccc”;

for _ in 0..1000000 {
	my_string.push_str(a);
	my_string.push_str(b);
	my_string.push_str(c);
	my_string.clear();
}
println!("{:?}", t1.elapsed());

} three times: Duration { secs: 0, nanos: 3670486 } Duration { secs: 0, nanos: 3342101 } Duration { secs: 0, nanos: 3224247 }

fn main() { let t1=std::time::Instant::now(); let mut my_string=String::new(); let a=“aaa”; let b=“bbb”; let c=“ccc”;

for _ in 0..1000000 {
	my_string=[a, b, c].concat();
}
println!("{:?}", t1.elapsed());

} three times: Duration { secs: 0, nanos: 81928727 } Duration { secs: 0, nanos: 77676702 } Duration { secs: 0, nanos: 78448896 }

fn main() { let t1=std::time::Instant::now(); let mut my_string=String::new(); let a=“aaa”; let b=“bbb”; let c=“ccc”;

for _ in 0..1000000 {
	my_string=[a, b, c].into_iter().cloned().collect::<String>();
}
println!("{:?}", t1.elapsed());

} three times: Duration { secs: 0, nanos: 353488365 } Duration { secs: 0, nanos: 260228488 } Duration { secs: 0, nanos: 323822175 }

fn main() { let t1=std::time::Instant::now(); let mut my_string=String::new(); let a=“aaa”; let b=“bbb”; let c=“ccc”;

for _ in 0..1000000 {
	my_string.extend([a, b, c].into_iter().cloned());
            my_string.clear();
}
println!("{:?}", t1.elapsed());

} three times: Duration { secs: 0, nanos: 42792508 } Duration { secs: 0, nanos: 39719744 } Duration { secs: 0, nanos: 46006185 }

As you see, the push_str() is more than ten times faster than the others.

.cloned() clones &str (which is copy-able), that is, it turns &&str into &str, it shouldn’t be measurable.
Did you compile with optimizations enabled?

1 Like

There is no allocation, aside from the result. As eddyb said, that’s not what cloned does. Also, your benchmarks are bad since you aren’t testing the same thing. It’s unfair. Here’s an actually fair benchmark using the built-in benchmarking support:

#![feature(test)]
extern crate test;
use test::Bencher;

const STR_A: &'static str = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
const STR_B: &'static str = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb";
const STR_C: &'static str = "cccccccccccccccccccccccccccccccccccccccc";

#[bench]
fn bench_push_str(b: &mut Bencher) {
    b.iter(|| {
        let mut s = String::new();
        s.push_str(STR_A);
        s.push_str(STR_B);
        s.push_str(STR_C);
        s
    });
}

#[bench]
fn bench_concat(b: &mut Bencher) {
    b.iter(|| {
        [STR_A, STR_B, STR_C].concat()
    });
}

#[bench]
fn bench_collect(b: &mut Bencher) {
    b.iter(|| {
        [STR_A, STR_B, STR_C].into_iter().cloned().collect::<String>()
    });
}

#[bench]
fn bench_extend(b: &mut Bencher) {
    b.iter(|| {
        let mut s = String::new();
        s.extend([STR_A, STR_B, STR_C].into_iter().cloned());
        s
    });
}

And here are the results for i686-pc-windows-gnu:

test bench_collect  ... bench:         364 ns/iter (+/- 83)
test bench_concat   ... bench:         158 ns/iter (+/- 88)
test bench_extend   ... bench:         349 ns/iter (+/- 88)
test bench_push_str ... bench:         331 ns/iter (+/- 199)

concat is the fastest because unlike the others, it actually pre-allocates the target String.

Like I said: why add what already exists?

Edit: one other note: your benchmarks are also bad because you don’t actually use the constructed string, so the compiler is free to just not run any of that code in the first place. That’s why each of the closures makes sure to return the constructed string, so it can’t be optimised away.

Also, this is not to say these benchmarks are good, or even all that meaningful, just that they’re less bad. :slight_smile:

Edit 2: And just for good measure, x86_64-pc-windows-msvc results:

test bench_collect  ... bench:         273 ns/iter (+/- 14)
test bench_concat   ... bench:          71 ns/iter (+/- 9)
test bench_extend   ... bench:         272 ns/iter (+/- 9)
test bench_push_str ... bench:         159 ns/iter (+/- 24)

Well, yes, I realize my test is kind of unfair since I put the string declaration out of the loop, thus I change it by put it in. But I still get a faster result for push_str(), which is much closer, ~ 1.5 time. I don’t know what’s going on there. Is there any other difference?

And for “why add what already exists?”, actually, what I need is to process big files line by line, and collect some parts from every four lines to a string, which a, b, c (and d) stand for.

As shown, you can do this in one line with my_string.extend([a, b, c, d].into_iter().cloned()).

Like I said before, are you compiling with or without optimizations? If the answer is “without” or you’re not sure, that’s a bigger problem than what’s written in the code.

yes, I complied it with “-O”

Behold the implementation of Extend<&str> for String:

impl<'a> Extend<&'a str> for String {
    fn extend<I: IntoIterator<Item = &'a str>>(&mut self, iter: I) {
        for s in iter {
            self.push_str(s)
        }
    }
}

That is almost exactly what your proposed push_strs would be! If there’s a performance difference, it’s down to the foibles of the optimiser, which you can’t really depend on behaving in exactly the same way in an actual program (as opposed to a microbenchmark).

Also, if you’re going to continue looking at perf numbers, make sure you’re using a recent nightly, and using the built-in benchmarking which re-runs the benchmark as many times as necessary until the variance has stabilised. Three samples is not really enough to draw any meaningful conclusions (not that microbenchmarks are particularly meaningful in the first place).

Oh, yes, the Extend(&str) is exactly what I want, the same purpose and the same way, thank you DanielKeep. Now, I’m just courious about the difference of performance.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.