'Display' trait description is ambiguous

String with locale name is a poor API, since it leaves every single implementation to look up the locale and know locale-specific behavior.

A real API should be using some kind of Locale object that has already been loaded and configured with the appropriate locale-specific behavior, and can be queried for information like the right decimal separator, or even just implement common features like number formatting.

2 Likes

Fully agreed. For the desktop app use cases there needs to be a conversion layer to be able to take the locale settings from the OS however. Which on Linux at least involves various environment variables with string locales in it. (I have no idea how this works on Windows or OS X.)

Even for web you need to convert from string locales, iirc the browser sends a HTTP header with language preferences.

Browsers have an API for this that is much smarter than just locale preferences in HTTP header: Intl - JavaScript | MDN

1 Like

It only uses strings for describing the locales. The locale data is loaded, either globally with setlocale, or to an opaque locale_t object with newlocale, which GNU libc added to support the C++ std::locale, and later was adopted into POSIX.1-2008.

There is a bit of a problem that while POSIX had those since the dawn of time, the BCP47, on which localization on the web is based, doesn't have that concept.

Back a while ago I made some extension BCP47-compatible syntax. Perhaps I should return to it some day.

I'd argue that for localization, most of the time the arguments should not come from the template anyway, in which case you can use a wrapper that takes the arguments and implements Display (or whatever other trait). Because a) you don't want to bother the translator with having to copy the weird format specifications to the translations, and b) you want the format to be consistent across the translations.

Any approach that doesn't support mixed locales is inherently flawed. I know many people who use it to get text in English (because often translations are terrible) but still want things like 24h time, weeks starting on Monday, date formats they are used to, etc.

2 Likes

I do not think this is a generally true / required property of those traits. It happens to be true for a set of primitives because those often have only one canonical mapping, but this breaks down once you get to more complex structs.

E.g. SocketAddrV6 has a flowinfo field which is not represented in its display output and can't be parsed from a string. So the struct's domain can't be round-tripped through its string representation.

I think TokenStream also has an issue where implicit groupings aren't represented in its string form?

The difference between

  • each impl Raven for T that has been observed so far has been black
  • one of the defining aspects of trait Raven is black plumage


image source

1 Like

Yes, almost exactly my point: If a type implements both Display and FromStr, there is no requirement/advice to have them match up or document if not. The question's not whether it is, it's whether it's nice.

Currently, if there is a struct that implements both Display and FromStr, there is no guarantee (except maybe in the specific crate docs) that the struct is indeed parsable with the FromStr trait from its Display. I'm not saying: "All structs should be parsable from their display", I'm saying: "Structs that implement Display and FromStr should have these implementations match (because ToString is derived from a no-argument Display)." Actually, the better approach would be to require ToString to match up with Display, but this requires specialization, since it has a blanket implementation.

But now it has become a documentation issue: Should the ToString docs say that it should match up with FromStr (Currently no-arg-Display), unless specified in the specific crate documentation.

As long as there are officially-sanctioned exceptions, this is impossible to use correctly in a generic context, i.e. T:FromStr+Display doesn't guarantee that T can be round-tripped through a string representation.

I believe that it's less error-prone when the more specialized documentation adds capabilities instead of taking them away, so I would rather put a warning in the generic documentation that they might not matchβ€” That leaves it up to the specific types to document that they do in fact support round-trips.

2 Likes

I was just pointing out that what we have for web is inherently flawed and unfortunately we are stuck with it. Because the value a browser sets in accept-language header, and in navigator.languages, can have alternatives with decreasing order of preference, but it has no standard way of saying sv-SE-for-formatting-but-not-text.

Well, actually it should be done by specifying en-SE, but most browsers won't be willing to set it, and most apps won't know what to do with it either.

I used to use mixed locales myself, for mostly the same reason. Lately I gave up and just called en_GB.utf-8 good enough now that everything converged to utf-8.

2 Likes

FYI: I filed a docs PR to improve the situation somewhat:

There I went with

Because a type only has one Display implementation, it is often preferable to only implement Display when there is a single most "obvious" way that values can be formatted as text. This could mean formatting according to the "invariant" culture and "undefined" locale, or it could mean that the type display is designed for a specific culture/locale, such as developer logs.

If not all values have a justifiably canonical textual format or if you want to support alternative formats not covered by the standard set of possible formatting traits, the most flexible approach is display adapters: methods like str::escape_default or Path::display which create a wrapper implementing Display to output the specific display format.

Suggestions on wording improvements are welcome.

7 Likes

To update the thread, the PR merged and nightly now has this new section in the documentation for the Display trait. While "make an informed decision" might not be the most useful resolution, I do think this resolves the ambiguity described in the OP.

For the specific case of fractions, while ICU locales define formatting rules for decimal numbers, I believe they do not have any facet for fraction format. As such, it is on the type author to (document and) justify a specific textual encoding as most canonical. This would lean towards what OP calls "semantic" encoding, although care should still be taken that this doesn't lead to actively misleading results in reasonable environments. E.g. if the value of 1+1/2 renders as 1β–―1β–―2, I know there's a font issue. If it renders as 1⁀1⁣⁄⁣2 because U+2064 INVISIBLE PLUS and U+2044 FRACTION SLASH have glyph mappings but the frac ligature is not enabled, then there's a problem.

Personally, I would have lean towards a canonical Display rendering an "improper" fraction (3⁄2) with U+2044. (After all, the representation is typically p/q, not mixed, and formatting shouldn't cause simplification; display the fraction as it exists.) I'd then offer an adapters which allow specifying whether display should be mixed or improper as well as what characters to use to separate the parts, with convenient shortcuts for the common ones (1+1/2, 1⁀1⁄2, 3/2, 3⁄2). Some days I might decline to pick one as canonical and require using an adapter method.

4 Likes

Maybe also add to the description:

If your type also implements FromStr, users may expect round-trips to strings to be supported. Adding this to the type's documentation for Display and FromStr is advised.

For the specific case of fractions, I made a merged PR for alternative displays that resolves it well enough (still uses / e.g. ASCII SOLIDUS for backwards compatibility and advanced parsing is also behind a function for small speed improvements for restrictive parsing)