Byte-string formatting


#1

I don’t know whether this is a problem anyone else has experienced, but it has become apparent to me that formatting byte strings (i.e. Vec<u8> or &[u8]) is a bit more cumbersome than formatting UTF-8 strings; however, normal string formatting functions cannot be used when inputs may not be encodable to UTF-8.

I think it could be solved by extending format_args! and friends to accept a byte string literal (b"...") containing the same syntax as the existing string formatting functionality, but with fields of unspecified type ({}) being formatted using, perhaps, a Bytes trait within std::fmt. Existing formatter types and types that do not implement the hypothetical Bytes trait could be supported by formatting to a UTF-8 string, then passing the bytes along to the formatted byte string.


#2

I thought formatter was designed to support this too? Can’t you use formatting functionality through write!()?

Example: http://is.gd/plsbZQ


#3

Yeah, that’s not too bad. It requires a wrapper for byte slices, too, else they’re formatted as "[0, 1, ...]". (Updated example: http://is.gd/VPXFQJ) Still, it feels like shoe-horning rather than true support. For instance, the format string is still UTF-8.


#4

FYI, I started a thread about pretty-printing byte slices: http://discuss.rust-lang.org/t/byte-string-formatting/1148


#5

To clarify, what I’m proposing would be an extension to format!, which would work as follows:

format!(b"foo {}", b"bar") == b"foo bar"
format!(b"foo {} {}", 0xAAu8, vec![0xBBu8, 0xCC]) == b"foo \xAA \xBB\xCC"

There isn’t currently a straightforward way to do this because format!("{}", vec![0xFFu8]) produces the string "[255]".


#6

This has been patched since, formatting does now explicitly not support producing nonstrings. I guess acricho saw this topic because of the timing(?)


#7

Nah, just a coincidence. The RFC for that change went in the day before I made this post.

I’m looking into implementing support for bytestring formats in format_args! now. Implementing a formatter that accepts non-UTF8 bytes will be a relatively small obstacle.