'Display' trait description is ambiguous

I'm working on something that builds on top of the fraction crate. Fraction has nicely implemented Display for fractional and decimal types.

Now, the delimiter in a fraction is defined by 0x2f: '/' : "ASCII SOLIDUS" in stead of 0x2044: '⁄': "FRACTION SLASH" for good reason: '/' is far more easy to type than '⁄' and probably also has much better font support. However, "FRACTION SLASH" has the proper unicode semantic meaning. Also, supporting fonts display fractions using '⁄' better: 1⁄2 image. Thus, provided you have a supporting font, ⁄ displays much better than /. However, if you don't have a proper font, it will end up being an ugly square-thingy, defeating the purpose of Display. It becomes even worse if I were to try to write 1½ in the proper unicode way: 1⁤1⁄2, wich renders properly with proper font support: image where the user will not understand the program's output if they don't have a proper font installed. . I would say this is a design decision that would bubble up all the way to the end user: Which character sets are supported?

I would propose the following options for unambiguous Display or choice of symbols for Displaying data:

  1. ASCII: using only ASCII characters. A structure implementing ASCII trait is sure to be backward-compatible with any legacy C/C++ code that doesn't handle unicode.
  2. Multilingual: Only use letters from languages and not special Unicode blocks.
  3. Semantic: Using the best possible semantic meaning of Unicode values.
  4. Visual: Use any combination of Unicode to make your output look as visually appealing as possible.

The problem is then not necessarily in the traits, but rather in the aliases needed.

5 Likes

Mm. I can’t get that excited about this because it’s still not going to be localized, and there are plenty of formatting decisions where “the most visually appealing” or even “the best semantic meaning” would be a locale-dependent decision.

9 Likes

Is there a timeline for localization? I thought Rust's Display would not get localized?

What are guidelines on the use of unicode in Display? <- I think that's the better question to ask here. The options I shared indicate different directions so to speak.

Not to mention that rarely does an I/O device know the encoding that its data is being interpreted as. File doesn't know about encoding (or have a reliable place to store it on the other end), stdio would need to grow ncurses-style queries (assuming the output is even a TTY), HTML output has its encoding specified in-band, etc. You could have methods that return impl Display for each. display_semantic vs. display_ascii or something. Or .display(FractionDisplayMode::…) could work too.

It seems pretty obvious to me that Display is Unicode. The String impl just outputs the string verbatim, and Rust strings are UTF-8. Nobody would expect <String as Display> to lossily convert to ASCII or whatever.

Display is meant for using-facing text. So to me, your Visual option sounds the most fitting.

3 Likes

I don't think that's actually the case exactly. There are two kinds of users:

  • Operators --- humans who run your software on their server, who are programmers or system administrators
  • End users --- non-programmers who interact with a web or mobile app

For various CLI tools, the user is both an operator and an end user.

Now, for end-users you absolutely must support internationalization, and there's no support for i18n in Display.

For operators, keeping it English-only is an okay (and some, including me, would even argue the best) choice.

So it seems to me that it's best to treat Display as directed solely towards operators, and to avoid the trap of leaking Display into the actual user-visible text.


Coming back to the original question: I wouldn't worry about end-users here. Something that is displayed in user-facing context should be going through completely different machinery.

For operators, we already have a precedent of using unicode in Duration's debug, where µ is used. It is also specifically U+00B5, micro sign, and not U+03BC, Greek Mu.

(if any one finds "operator vs end user" distinction illuminating, consider giving Creating Domain Specific Error Helpers in Go With errors.As · The Ethically-Trained Programmer a read)

14 Likes

I would simply expect i18n to occur upstream of Display.

1 Like

There was already an answer in Are changes to fmt::Display considered breaking?:

I think the only really solid and well-intuition-portable Display implementations are

  • Display + Error: display is how you display the error to "the appropriate audience" for that error
  • Display + FromStr: both use the same canonical (locale invariant) string representation
  • `.method(...) -> impl Display": display adapter for a complex type, read the method docs
  • An actual text or string type
  • (any other type who's sole/primary purpose is to be stuck into a Write output stream)

If your type is Display and doesn't fit one of these categories, you're running a risk of expectation mismatch, especially if you don't document what your Display implementation is. And if you do, changing the documented contract is clearly breaking.

Reading your article, I assumed your use-case was Display + Error. This is different from my Display + FromStr use-case.

So, I want to make a program with a friendly CLI interface, implementing Display + FromStr

Ok, so. fraction implements Display + FromStr. Now, I am making a composite type that also implements Display + FromStr. That is, I do not only want to show my output nicely to the user, but I also want them to be able to easily write the output as input. As the linked discussion shows, this is a common-enough use-case. What I do inside my parser (and fraction does in theirs), is to select the part of the string that is a fraction or a number and offload it to the delegate FromStr method.

Here is a list of requirements for the different Display types:

  • Display + Error: Localization
  • Display + FromStr: ?Localization? + stability + writeability + Invertibility + Catchability
  • .method(...) -> impl Display crate-defined
  • text: already localized
  • other: crate-defined

Also, the types of programs that would need these:

  • Display + Error any program -- programmer-facing
  • Display + FromStr CLI programs -- learner-facing

Digging into Display + FromStr requirements a bit deeper:

  • ?Localization?: may need localization. e.g. thousands separator. I would say polish notation. Then again, this opens up a bit of a minefield for crate maintainers if now both the parser and printer has to accomodate to all these languages. Of course, fundamental crates should do this, but it'd also be reasonable for someone to not want to do Polish notation for their strings.
  • Stability: breaking the string representation is a breaking change, since other crates or end-users may depend on it.
  • Replicability: like serialization. ∀t: T holds assert_eq!(t,T::FromStr(format!("{}", t)))
  • writeability: Display output is an easy way to write your data type for a person with a keyboard. This also means that the parser is lenient towards visually similar characters.
  • documented: For downstream crates, there should be an easy, Documented way to isolate your display from the rest of the string to pass it on to your FromStr method.

I use this a lot. I have a recursive datatype which is really a hassle to construct manually, but I have a nice parser and a nice display, so that all my (handwritten, non-io-related) unit tests use strings as input.

Now, if Display then serves as a template for how to write input for your program, you'd want it to be highly customizable. This is outside the scope of this question. I would say maybe there should be another trait Parsable or something that is the same as Display, but keeps these promises.

This can be further subdivided into "Parsability": the ability to parse a specific string and "Displayability": the ability to write into a specific output. *invertibility* then means that the parser and display are linked. e.g. - weak: - strong: "Everything that can be parsed can be displayed in the same way, except for redundancy" e.g. "2+0" -> "2", but everything else

∀s: T::FromStr(s).is_ok() ∃<opt> such that assert_eq!(s, format!("{<opt>}", T::FromStr(s))

  • specialized: ∀t: T ∀<opt>!=":?" holds assert_eq(t,T::FromStr(format!("{<opt>}",t)))

As @matklad said above, Display is the wrong place to do i18n. For a start you would need a different function to be able to pass an extra parameter with the required context of locale settings anyway (imagine a web server backend for many different users, you can't have a global locale there).

I18n and L10n are better handled in crates outside std at this point in time. There seem to be many different approaches to this (gettext style, project fluent, icu4x etc) so at this point it is not suitable for inclusion in the standard library.

3 Likes

How would that even work? Only the downstream user know what exact locale the current user or request needs. Unless you are suggesting to not support the server use case at all?

Let's not confuse i18n and L10n here: I think Display is exactly the place to do i18n, e.g. make it easy to add L10n later, based on the architecture. Then it is up to a crate to specify whether their Display does L10n.

But yeah, I didn't want a specific localization feature, I just wanted to be able to pass arguments to the fmt function for fine-grained control of how it is displayed. I mean, if the options are there for floats, why not allow formatting options for non-std inter-crate displaying?

I was thinking of being able to pass multiple arguments to Display, but maybe then it becomes too complex, because it would become a macro? Something like:

println!("{lc$,f$}",get_locale(),'/')

/// docstrings that say that I accept lc and f as inputs
impl Display for MyCoolStruct {
 fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
    match formatter.args["lc"] {
      "am_ET" => write!("Hello in Armenian! {}", formatter.args["f"]),
      _ => write!("Hello default! {}", formatter.args["f"]),
    }
  }
}

where write! produces an Error on a name collision? Actually it'd be better if the direct descendant is forced to consume the arguments, optionally passing them on. Also, it'd be totally fine for formatter.args to be very restrictive in the datatypes it'd accept. Another way would be to pass it directly to the function (macro-like):

println!("{lc$,f$}",get_locale(),'/')

/// docstrings that say that I accept lc and f as inputs
impl Display for MyCoolStruct {
 fn fmt(&self, formatter: &mut fmt::Formatter, lc: &str, f: char) -> fmt::Result {
    match lc {
      "am_ET" => write!("Hello in Armenian! {}", f),
      _ => write!("Hello default! {}", f),
    }
  }
}

Then, if I do want to do locale stuff, I can put in my documentation that I accept an lc argument to my display with the locale.

Ah, it turns out the Documentation encourages abusing different traits for formatting alternatives! here they give an example of imlementing Binary trait for displaying the magnitude of a vector, great!

to summarize:

  • Display + FromStr includes promises on stability that Display + Error doesn't have. Plz make this a separate trait to separate the two use-cases or standardize the associated promises of being writeable, invertible and "catchable"
  • Abusing Binary trait is horrible!, pls allow us to pass custom arguments. or just have a single argument whose behaviour is crate-dependent. It would make writing Display sooo much nicer! Even 32 or 64 bits of information: [u8; 4] would make everything easier.

If you're interested in making distinctions that the std formatting traits don't, you might like my library manyfmt, which lets you invent as many new types of formatting as you want (defined as structs rather than additional traits) and pass additional arguments to them (in the struct). It doesn't have any integration with format-string syntax, but it could be a foundation for experimentation.

5 Likes

This probably falls under the domain of "implicits" / "ambient resources". If you absolutely must have it available in specifically Display::fmt, you'd use something like scoped thread locals (not an endorsement of this specific implementation; I couldn't quickly locate the solution applied by tokio, if it even exists publicly).

There's already a standard way to do this: adapter methods, fn method(&self, args) -> impl Display, e.g.

println!("{}", cool.display_with(get_locale(), '/'));

impl MyCoolStruct {
    fn display_with<'a>(&'a self, lc: &'a str, fc: char) -> impl 'a + Debug + Display {
        // not a std function, but imo should be
        fmt::from_fn(move |f| match lc {
            "am_ET" => write!(f, "Hello in Armenian! {}", fc),
            _ => write!(f, "Hello default! {}", fc),
        })
    }
}

When doing this, if you still implement Display, it would be with the most widely acceptable default / invariant C.UTF-8 locale display. manyfmt looks like a fairly cool generalization for the pattern.

In not small part because Formatter is defined in core, thus must know the full set of potential arguments to carry them (no access to allocation). And there's no consensus on what extra configuration to provide to formatting. While &str for locale is what C uses, it's also a global setting there and not a great way to actually record locale information if there are other sufficiently future compatible options.

In addition, what about mixed locales? For example on my computer I have set LC_MESSAGES to en_GB.UTF-8, but LC_TIME and a few others are set to SV_SE.UTF-8.

Why? Swedish translations of software often sucks, and it is more useful for googling to have the English terms anyway. But I still want correct local time, currency, paper format etc. And of course I want the comma as decimal separator (LC_NUMERIC) with space as thousands separator.

But there are many programs that don't properly handle such mixed locale settings. And how should it be handled by your lc when formatting a composite value with both a message and a time in it?

7 Likes

Customization of display output is definitely valuable. For some prior art, Julia has a somewhat interesting display methodology: a type T defines show(io::IO, mime::MIME, x::T) for each MIME output type it supports, and then display(x) uses the richest output supported by the current system, e.g., HTML for a notebook but plain text for a terminal.

The display functions ultimately call show in order to write an object x as a given mime type to a given I/O stream io (usually a memory buffer), if possible. In order to provide a rich multimedia representation of a user-defined type T, it is only necessary to define a new show method for T, via: show(io, ::MIME"mime", x::T) = ..., where mime is a MIME-type string and the function body calls write (or similar) to write that representation of x to io.

For example, if you define a MyImage type and know how to write it to a PNG file, you could define a function show(io, ::MIME"image/png", x::MyImage) = ... to allow your images to be displayed on any PNG-capable AbstractDisplay (such as IJulia).

In addition, the IO carries contextual information that can drive formatting:

For a more verbose human-readable text output for objects of type T, define show(io::IO, ::MIME"text/plain", ::T) in addition. Checking the :compact IOContext key (often checked as get(io, :compact, false)::Bool) of io in such methods is recommended, since some containers show their elements by calling this method with :compact => true.

It would be interesting if Rust could allow for such customization, maybe via improved Formatters that carried information like display richness/capabilities and desired localization.

3 Likes

The argument part is solved by @kpreid and @CAD97's comments, thanks a lot!

Now, for the part on ambiguity of Display, I exactly talked about Julia's printing methods with a friend yesterday! Especially the distinction Julia makes between summary (type information), Show, (parsable Julia code OR MIME'd visual output) dump (full debug description) and showerror (descriptive error) is not made in Rust's Display (show or print) vs Debug (dump).

I think the design goals of Rust (focus on small program footprint) and Julia (focus on fast, scientific coding that feels like scripting) make a difference here. If a crate has interesting data structures, then putting it nicely in HTML/textplain or other is outside the scope of that crate. Similarly, printing compileable Rust code for your type doesn't make much sense I'd say. What does make sense, however, is to specify whether or not your Display is parsable by the type it displays. By my previous post, I'd argue that this calls for standardization, either by adding traits DisplayErr and DisplayParsable (plz better names) or something else. Then the current Display would be Julia's show( ::MIME"text/plain"). Other MIME's such as png/json/... should be handled by serde I think.

Then again, localization. I do like the idea of putting this as information in the formatter, but the formatter should remain lean, and whether adding locale information for all different locales for all different settings fits in the current formatter is a question to be raised. At that point, I would almost call for another level of indirection, to create a string/format that can be easily processed by some translation library.

2 Likes

I disagree with the operator/end user distinction. It's not a truth inasmuch as it is a design decision that influences what software will look like. If we assume that there are two types of people, we will create software that enhances this as a sort of self-fulfilling prophecy. I think software should encourage users to contribute at all levels of programmer-ness.

2 Likes

Interestingly, the Unicode standard (http://www.unicode.org/reports/tr25/ section 2.5) considers U+03BC to be preferred, and U+00B5 to be for legacy usage only.

6 Likes