A little proposal for string concatenation

ruster · August 22, 2016, 3:57am

Recently, I work on text processing heavily, and in many cases, I need to concatenate strings like: let mut my_string=String::new(); my_string.push_str(“aaa”); my_string.push_str(“bbb”); my_string.push_str(“ccc”); … But this way seems kind of stupid, so, a more convenient one could be added, like: fn push_strs(&mut self, strs: &[&str]) thus, the case above can be simplified to: my_string.push_strs(&[“aaa”, “bbb”, “ccc”, …]);

I don’t know if this is the right place to post the topic, but I’d like to get some positive response.

DanielKeep · August 22, 2016, 4:06am

Why add what already exists?

fn main() {
    let my_string = ["aaa", "bbb", "ccc"].concat();
    assert_eq!(my_string, "aaabbbccc");
    
    let my_string = ["aaa", "bbb", "ccc"].into_iter().cloned().collect::<String>();
    assert_eq!(my_string, "aaabbbccc");
    
    let mut my_string = String::new();
    my_string.extend(["aaa", "bbb", "ccc"].into_iter().cloned());
    assert_eq!(my_string, "aaabbbccc");
}

What’s more, the last two work for everything that’s iterable, not just slices.

ruster · August 22, 2016, 4:22am

Firstly, sorry, there is a little misleading here, actually what I mean is for &str variables, but not literals that already exist at compiling time.

Secondly, I know these ways you mentioned, BUT all of them require memory allocation intermediately which is dramatically impacts the performance, paticularly for a large amounts of text. I think avoiding the memory allocation is very helpful for this kind of work.

DanielKeep · August 22, 2016, 4:24am

Every one of those should work for &str variables just as much as string literals. None of them involve memory allocation aside from the resulting String.

ruster · August 22, 2016, 4:43am

I’m afraid the clone() method does result in memory allocation. I test these methods: fn main() { let t1=std::time::Instant::now(); let mut my_string=String::new(); let a=“aaa”; let b=“bbb”; let c=“ccc”;

for _ in 0..1000000 {
	my_string.push_str(a);
	my_string.push_str(b);
	my_string.push_str(c);
	my_string.clear();
}
println!("{:?}", t1.elapsed());

} three times: Duration { secs: 0, nanos: 3670486 } Duration { secs: 0, nanos: 3342101 } Duration { secs: 0, nanos: 3224247 }

fn main() { let t1=std::time::Instant::now(); let mut my_string=String::new(); let a=“aaa”; let b=“bbb”; let c=“ccc”;

for _ in 0..1000000 {
	my_string=[a, b, c].concat();
}
println!("{:?}", t1.elapsed());

} three times: Duration { secs: 0, nanos: 81928727 } Duration { secs: 0, nanos: 77676702 } Duration { secs: 0, nanos: 78448896 }

fn main() { let t1=std::time::Instant::now(); let mut my_string=String::new(); let a=“aaa”; let b=“bbb”; let c=“ccc”;

for _ in 0..1000000 {
	my_string=[a, b, c].into_iter().cloned().collect::<String>();
}
println!("{:?}", t1.elapsed());

} three times: Duration { secs: 0, nanos: 353488365 } Duration { secs: 0, nanos: 260228488 } Duration { secs: 0, nanos: 323822175 }

fn main() { let t1=std::time::Instant::now(); let mut my_string=String::new(); let a=“aaa”; let b=“bbb”; let c=“ccc”;

for _ in 0..1000000 {
	my_string.extend([a, b, c].into_iter().cloned());
            my_string.clear();
}
println!("{:?}", t1.elapsed());

} three times: Duration { secs: 0, nanos: 42792508 } Duration { secs: 0, nanos: 39719744 } Duration { secs: 0, nanos: 46006185 }

As you see, the push_str() is more than ten times faster than the others.

eddyb · August 22, 2016, 4:52am

.cloned() clones &str (which is copy-able), that is, it turns &&str into &str, it shouldn’t be measurable.
Did you compile with optimizations enabled?

DanielKeep · August 22, 2016, 4:57am

There is no allocation, aside from the result. As eddyb said, that’s not what cloned does. Also, your benchmarks are bad since you aren’t testing the same thing. It’s unfair. Here’s an actually fair benchmark using the built-in benchmarking support:

#![feature(test)]
extern crate test;
use test::Bencher;

const STR_A: &'static str = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
const STR_B: &'static str = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb";
const STR_C: &'static str = "cccccccccccccccccccccccccccccccccccccccc";

#[bench]
fn bench_push_str(b: &mut Bencher) {
    b.iter(|| {
        let mut s = String::new();
        s.push_str(STR_A);
        s.push_str(STR_B);
        s.push_str(STR_C);
        s
    });
}

#[bench]
fn bench_concat(b: &mut Bencher) {
    b.iter(|| {
        [STR_A, STR_B, STR_C].concat()
    });
}

#[bench]
fn bench_collect(b: &mut Bencher) {
    b.iter(|| {
        [STR_A, STR_B, STR_C].into_iter().cloned().collect::<String>()
    });
}

#[bench]
fn bench_extend(b: &mut Bencher) {
    b.iter(|| {
        let mut s = String::new();
        s.extend([STR_A, STR_B, STR_C].into_iter().cloned());
        s
    });
}

And here are the results for i686-pc-windows-gnu:

test bench_collect  ... bench:         364 ns/iter (+/- 83)
test bench_concat   ... bench:         158 ns/iter (+/- 88)
test bench_extend   ... bench:         349 ns/iter (+/- 88)
test bench_push_str ... bench:         331 ns/iter (+/- 199)

concat is the fastest because unlike the others, it actually pre-allocates the target String.

Like I said: why add what already exists?

Edit: one other note: your benchmarks are also bad because you don’t actually use the constructed string, so the compiler is free to just not run any of that code in the first place. That’s why each of the closures makes sure to return the constructed string, so it can’t be optimised away.

Also, this is not to say these benchmarks are good, or even all that meaningful, just that they’re less bad.

Edit 2: And just for good measure, x86_64-pc-windows-msvc results:

test bench_collect  ... bench:         273 ns/iter (+/- 14)
test bench_concat   ... bench:          71 ns/iter (+/- 9)
test bench_extend   ... bench:         272 ns/iter (+/- 9)
test bench_push_str ... bench:         159 ns/iter (+/- 24)

ruster · August 22, 2016, 5:23am

Well, yes, I realize my test is kind of unfair since I put the string declaration out of the loop, thus I change it by put it in. But I still get a faster result for push_str(), which is much closer, ~ 1.5 time. I don’t know what’s going on there. Is there any other difference?

And for “why add what already exists?”, actually, what I need is to process big files line by line, and collect some parts from every four lines to a string, which a, b, c (and d) stand for.

withoutboats · August 22, 2016, 5:31am

As shown, you can do this in one line with my_string.extend([a, b, c, d].into_iter().cloned()).

eddyb · August 22, 2016, 5:50am

Like I said before, are you compiling with or without optimizations? If the answer is “without” or you’re not sure, that’s a bigger problem than what’s written in the code.

ruster · August 22, 2016, 5:55am

yes, I complied it with “-O”

DanielKeep · August 22, 2016, 6:11am

Behold the implementation of Extend<&str> for String:

impl<'a> Extend<&'a str> for String {
    fn extend<I: IntoIterator<Item = &'a str>>(&mut self, iter: I) {
        for s in iter {
            self.push_str(s)
        }
    }
}

That is almost exactly what your proposed push_strs would be! If there’s a performance difference, it’s down to the foibles of the optimiser, which you can’t really depend on behaving in exactly the same way in an actual program (as opposed to a microbenchmark).

Also, if you’re going to continue looking at perf numbers, make sure you’re using a recent nightly, and using the built-in benchmarking which re-runs the benchmark as many times as necessary until the variance has stabilised. Three samples is not really enough to draw any meaningful conclusions (not that microbenchmarks are particularly meaningful in the first place).

ruster · August 22, 2016, 6:18am

Oh, yes, the Extend(&str) is exactly what I want, the same purpose and the same way, thank you DanielKeep. Now, I’m just courious about the difference of performance.

system · March 25, 2019, 8:26am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Implement Add for String + String libs	71	9159	July 2, 2019
Idea: Macro for constructing Strings without format! libs	22	2446	August 21, 2020
Vector Concatenation language design	57	2951	May 9, 2021
Getting rid of String slices for better ergonomy language design	55	3042	March 25, 2019
Repeat string literals?	7	2790	September 7, 2022

A little proposal for string concatenation

Related topics