To_upper speed

Whatever we do for the general case, I don't think there's anything that's going to be faster than a less than check on the byte being under 128. Given that spaces, line feeds and control chars are in that list is there any reason not to go ahead with the initial PR while we try and iterate on the general case. (It's a pretty significant speedup for one additional instruction)

1 Like

I don't see why not. It can always be changed again later.

1 Like

The only challenge is to have varied enough benchmarks or use cases to justify the special case. (Meta-comment, but that's why win-win PRs are so cool - if you can show you improve every use case, then your PR becomes irresistible, and there are not so many hard decisions about tradeoffs.)

For this particular PR, there are only benchmarks for the happy case - what about other text? :slightly_smiling_face:

3 Likes

I've updated the PR with two more benchmarks (strictly ASCII / non-ASCII characters) and added a comment with the before / after results: Add a check for ASCII characters in to_upper and to_lower by mcastorina · Pull Request #81358 · rust-lang/rust · GitHub

As expected, the non-ASCII case performs the same. The all ASCII case gets a speedup of ~200,000 ns/iter.

3 Likes

Excellent. The replacement char \u{FFFD} is going to appear a fair bit if people do String::utf8_from_lossy(). I'm assuming that the replacement char isn't going to change case at all...?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.