Show, to_string, and guidelines


#1

There has been a long-running debate about Show which we need to resolve in settling basic API guidelines.

On the one hand, to be able to use #[deriving(Show)] (a very convenient feature) it is crucial that essentially all types implement the Show trait.

On the other hand, anything implementing Show automatically gets a to_string method. In certain cases, this makes it problematic to implement Show. For example, Path is not necessarily unicode, and so can only be converted to a string in a lossy fashion. For this reason, Path has not implemented Show directly, but instead has a display adapter (which does implement Show), forcing users to acknowledge the lossyness.

Unfortunately, this means that structures with Path fields cannot use #[deriving(Show)], which is quite painful.

So the questions are:

  • What expectations should we have for a Show implementation? When is lossyness acceptable, if ever?
  • Should to_string be coupled with Show?

In particular, you could imagine having multiple Show-like traits, some for debugging (where lossyness is allowed) and some for more “formal” string conversions. Anything implementing the more “formal” string conversion could automatically implement the debugging version (via a blanket impl), and both could support deriving.

But this seems like a fair amount of engineering around a very core concept. A simpler alternative would be to say that Show is intended for lightweight string conversions, primarily for debugging, and may be lossy.

Some of the tradeoffs are discussed in this github thread.

I was intending to write up an RFC on this topic, but before I do so, I wanted to solicit general feedback from the community, to see which way people lean, and perhaps to discover some new alternative designs.

(Note that Java has similar issues, since every object implements toString there. See this page for some discussion/guidelines.)


#2

There’s also another problem with Show – for debugging, strings should probably be printed like in Python’s repr. The same goes for the cases where there’s a container containing strings:

[foo, bar]

for a vector is ugly (and ambigous), I’d much rather have to have

["foo", "bar"]

as the default (as seen in Python).


#3

#12128 is related.


#4

https://github.com/rust-lang/rust/pull/16544 is also loosely related.


#5

I (most people) expect to_string to return a string representation of the value. The same as python str(), C# and Java toString(). It’s possible and likely that this string representation doesn’t perfectly represent the contents, so lossyness is acceptable.

I’m alright with it being coupled with Show, it makes sense to me.

As for the Path example. It’s alright in my opinion, that’s why we use Path and not a String to pass paths around, right?

Indeed, this is important to distinguish between strings and enum-ish types.


#6

I think Show should be the one that will print something even if it has to drop down to bit patterns. It’s meant for the programmer, and is not really useful for anyone else. Its output is centered around the structure of the data, the primary sink is testing and the logging facilities.

Later when we have real Unicode, locale, etc. support, we can add a Printable or something that deals with things meant for end users. UTF-8 doesn’t mean much if we’re going to ignore the other 99% of the Unicode standard.

I have no idea what to_string should do because I have no idea what a String is conceptually.


#7

I would really, really like to be able to support multiple Show-like traits for different kinds of output in the future, so I’d like to not block that off.

Wish list thing - I’d like to be able to implement to_string and then derive Show (i.e., the reverse of today’s system) - in many cases, implementing to_string is conceptually simpler than implementing Show and this is such a fundamental engineering task that I would like it to be as easy as possible.


#8

In Haskell and Rust I have only ever used Show for debugging. So as long as I can understand its representation, it’s good for me.

I’ve never used the “to_string” function in Rust. With most languages I’ve used, when I need an exact representation in string form I expect to use a specialized function for the task. Although there may be times where I would use “to_string” to build up a larger string, but again, just for debugging.


#9

I assume you’re talking about #[deriving(Show)] here?

The {} formatter is using Show in things like let number_of_cats = 10u; println!("there are {} cats", number_of_cats);.


#10

It’s not entirely obvious to me that to_string is meant to produce a representation tailored for debugging.

For formal representations one should probably look to the the serialize module.


#11

What if we had deriving(ToString, Show), where ToString is just an alias to Show unless explicitly defined.

Kind of like Eq/PartialEq. Eq just means that the eq() function defines an equivalence relation. Similarly, we can have Show be something that doesn’t leave anything out. #[deriving(ToString)] will alias to_string() to show(), but we can define our own version of to_string() manually without breaking the Show heirarchy.


#12

I like @Manishearth’s suggestion, in fact came to the thread to suggest the same thing.


#13

I like the suggestion but it don’t think show() is a proper name for a “detailed string representation”.


#14

The real concern here is “reversibility” or the assumption of it, right? We don’t want people assuming that converting a std::Path to a string and then back to a std::Path again will produce an identical path object. I think that is a valid concern, but at the same time I don’t think crippling the std API is the right think to do.

No one should ever assume that objects can be converted to and from strings without some kind of data loss, whether than object is a std::Path or anything else. The documentation should be up front about that: Show, to_string and from_str are not a serialization framework they are lossy, best-effort methods for representing complex objects as strings and for generating complex objects from string data. If we want Show and to_string to be lossless, then shouldn’t the same be true for FromStr? Do we want from_str:: to return None if the string passed in is not precisely representable as a 32-bit floating point value on the system? Probably not.

I do think to_string and Show should be decoupled. Ideally I could implement to_string and get Show automatically as a side-effect, but I should also be able to implement fmt::Show and do my own formatting based on the formatter. Not sure what the current state is, but I want to be able to implement either or both and have them behave differently as needed.


#15

cc @SimonSapin @eridius, I believe you both have strong opinions on this question and would value your input on this thread.


#16

There should be a difference between a to_string and to_repr. Just from scanning through code, it’s hard to tell if deriving(Show) was meant for debugging, or to actually turn it in a String.

I’m quite a fan of making things implement Show, and using a Formatter, instead of grouping strings together, as it saves on allocations.


#17

I don’t either, just wanted an example name and I’m terrible with naming things :stuck_out_tongue_winking_eye:


#18

Path does not implement Show for reasons that I think are rather unique to Path. Specifically, most programming languages/standard libraries have historically used strings to represent file paths. Many languages/libraries provide path-centric APIs that provide a better user experience, but still typically allow (and very commonly use) strings to represent paths in cases where complicated path manipulation is not necessary.

Programming languages that do this fall in one of two camps: 1) languages that can represent arbitrary binary data (at least, binary data without NULs) in strings, in which case representing paths as strings is workable, if not ideal, and 2) languages that require strings to conform to some encoding, in which case representing paths as strings is definitively broken.

The end result of all this is that a great many programmers are used to using strings to represent file paths. To these programmers, converting a Path to a String is highly likely to be assumed to be a reversible operation. This is made even more likely by the fact that the vast majority of file paths people see are in fact representable as a valid utf-8 sequence. Because of this, developers that do represent paths as strings are unlikely to ever notice an issue, until such time as their software mysteriously breaks on someone else’s system (sometimes with catastrophic results, if the breakage results in a truncated path and not a hard error, and the attempted operation is destructive).


All that said, my expectation is that pretty much every non-primitive data structure besides Path does not carry an implicit assumption that round-tripping through String will work (I say non-primitive because things like numeric types are expected to round-trip through String, and that’s fine).

I am undecided as to whether it makes sense to allow Path to conform to Show if .to_string() is divorced from Show. On the one hand, not being able to say path.to_string() would be good, but on the other hand, I worry that people will still make the assumption that format!("{}", path) is appropriate.

Perhaps a compromise would be to split off .to_string() from Show, and then to implement Show on Path, but implement it to return something like Path("lossy_representation"). This can’t be round-tripped, because it can’t be passed back into Path's constructor, and it also can’t be trivially used in places where strings are expected for data representation, such as being used as a string value in a JSON blob.


In order to ease transition, we could remove the implicit implementation of ToString, and modify #[deriving(Show)] to also derive a ToString impl. This would obviously not work for classes that manually implement Show, but we could also support #[deriving(ToString)] there. Then Path can implement Show without getting ToString.


#19

Regarding #12128, I’ve long thought that {} should support the # modifier for precisely this purpose. In the context of Path, using {:#} could then print something like Path(b"lossless/path/repr").


#20

Maybe we could mimic Ruby’s handling of conversion to string.

In Ruby there are three “to_string”:

  1. #to_s: returns a string for display purposes. This method is meant to be seen by a human and this is what you would put in a Ruby’s puts or in a Rust’s println!.
  2. #to_str: returns a string from which the object can be recreated. This method is implemented by classes that are basically wrappers around strings and that can be converted losslessly back and forth.
  3. #inspect: returns a string for debug purposes. This is what is used in debug prints (i.e. p). Sometimes the value returned by #inspect can be used to convert losslessy back and forth, but this is not a requirement.

Examples:

A string:

x = "abc"
puts x.to_s #=> abc
puts x.to_str #=> abc
puts x.inspect #=> "abc"

An array:

x = [1, 2, 3]
puts x.to_s #=> [1, 2, 3]
puts x.to_str #=> NoMethodError: undefined method `to_str' for [1, 2, 3]:Array
puts x.inspect #=> [1, 2, 3]

A file

x = File.open('/tmp/x')
puts x.to_s #=> #<File:0x000000023dc898>
puts x.to_str #=> NoMethodError: undefined method `to_str' for #<File:/tmp/x>
puts x.inspect #=> #<File:/tmp/x>