Do we need unicode whitespace?

According to TR14:

  • FORM FEED (U+000C)
  • VERTICAL TABULATION (U+000B)
  • LINE SEPARATOR (U+2028)
  • PARAGRAPH SEPARATOR (U+2029)
  • NL, LF, CR, CR LF (but not between CR and LF)

It seems wrong to me to not support proper pattern whitespace because other tools don't support pattern whitespace properly. That should be their problem, not ours. It seems reasonable to trust the programmer to use the appropriate whitespace for what their tools work with.

3 Likes

The problem I see with that is breakage across the ecosystem. Currently I can be sure that all rust-code is UTF-8 and my editor will handle it nicely, regardless of the crate. Now you could say this will then also include all those white space types, but apparently it's such a niche that we don't expect this to work everywhere or possibly even in half the cases.
So we end up with rust files that break in multiple editors & tools because we allowed something that's as edge case-ish as it can be (which is why it took so long to break).

2 Likes

Moderator note: Please keep comments constructive and charitable. If you have questions about our moderation, please email us.

2 Likes

Having obscure features that aren’t actually supported by tooling seems like a recipe for disaster to me. In general, I think it is important to keep language syntax as simple as possible to maximize compatibility in the ecosystem.

I think this should have been feature gated before 1.0, just like non_ascii_idents. But it’s too late now.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.