Pre-RFC: `std::fmt::FormattingOptions.indentation`

Summary

This is a pre-RFC for a indentation field for the FormattingOptions struct in std::fmt that is used for giving directives to Formatter objects.

A brief Github search for write!(f, "\t returns 7.6k results, which makes me believe the use case I had in mind, which is writing printers for recursive structures, is not that uncommon.

While printing recursive structures, the typical mechanism is to pass a depth that can be used to decide what the indentation should be. If FormattingOptions.indentation existed, the user could set the indentation to be fo.indentation + N by adjusting the internal state during recursive calls and achieving the same result without having to pass an explicit depth.

I think setting indentation is common enough that it deserves a place in the std formatter, I searched for similar proposals but haven't seen one, hence wrote this. Would love to hear feedback and comments, thank you.

9 Likes

I would love to see this improved.

There are multiple desired ways to print structures, and people often want both a more compact form and a more indentation-based long form. See this playground for an example of the two formats.

Given that, and more generally the user desire to customize this, I think it would make sense to expose this using some helpers rather than directly as a field; an individual routine using a formatter doesn't necessarily know which format the user wants.

We already have some of this, if you use the debug printing helpers in Formatter. Those helpers work for anything that looks roughly like a list (with square brackets), a set (with curly braces), a map (key-value pairs), or a struct (with fields). If there are cases those don't cover, or if they're awkward to use for some kinds of structures, perhaps we could generalize and expose some of the internals of those?

Those helpers operate internally by creating a wrapped Write implementation that indents the text passed through it. For debugging purposes, that seems sufficiently efficient. Those helpers also key off of the alternate flag to decide whether to newline-and-indent.

We could, potentially, generalize some kind of wrapper looking like fmt.something(|fmt| /* format the body to fmt */):

  • Assume you've already written the opener (e.g. foo {), and will write the closer after the something wrapper returns.
  • In non-pretty-printing mode, just print a space before and after calling the provided formatting function.
  • In pretty-printing mode, print a newline, call the provided formatting function using a wrapped writer that does one more level of indentation, then print another newline if the last thing printed wasn't one.
  • Possibly do something to handle the "empty" case, such as not printing the newlines or spaces unless something is formatted to the inner formatter. That might be easier than telling users to special-case the empty case themselves.

Currently, it seems like some of this has been factored out into an internal DebugInner, but there's a lot of duplicated logic there, and more duplicated logic in DebugMap and DebugStruct and DebugTuple. It might be possible to deduplicate all the common elements of that, and expose some of the common helpers.

4 Likes

Thanks! I think the current helper set (tuple, list, struct, set, map) are surprisingly restrictive, in the sense that each of them are very specific in how they are printed as far as I can tell;

I for instance am not sure how I would implementing something like:

[[1, 2],
 [3, 4,
  5, 6]]

for a Foo(Vec<Vec<i32>>).

Similarly they all have specific begin-end markers (<name> { ... } for struct, [..] for list, (..) for tuple, {..} for set and map). So a similar problem arises for:

enum Tree {
  T(Box<Tree>, i32, Box<Tree>),
  E
}

I think the predefined markers mean one cannot print T(T(E, 1, E), 2, T(T(E, -1, E), 3, E)) as:

<1>
    <2>
    <3>
        [_]
        <-1>

Even in DebugInner it seems to me that the item separator is fixed to be , as far as I can see, but I might be mistaken.

I like the idea of having something similar to fmt.debug_level that could either insert space in default and newline + indent in the pretty case without assuming begin-end markers or separators, perhaps that's a good middle ground between totally unrestricted internal indentation state vs too restricted printing.

Even here of course, there are cases (albeit contrived) that are still not possible to implement without having access to the indentation.

For instance in racket/trace, after a certain indentation level the printer defaults to writing the number instead of padding even more:

> (define (f x) (if (zero? x) 0 (add1 (f (sub1 x)))))
> (trace f)
> (f 10)
>(f 10)
> (f 9)
> >(f 8)
> > (f 7)
> > >(f 6)
> > > (f 5)
> > > >(f 4)
> > > > (f 3)
> > > > >(f 2)
> > > > > (f 1)
> > > >[10] (f 0)
< < < <[10] 0
< < < < < 1
< < < < <2
< < < < 3
< < < <4
< < < 5
< < <6
< < 7
< <8
< 9
<10
10

Of course one can always implement such options manually without the Formatter, but I thought it should still be a part of the discussion.

2 Likes

The current builder-style helpers all borrow the Formatter as &mut, making them exclusive, so you can't print anything while you have one of those builders. That means we could potentially change their internals to put much more of the work into a single generalized helper such as the one I described, making use of a wrapped Write to handle indentation in the pretty-printing case. The builders could print the begin and end markers, and whatever internal punctuation is desired, and delegate everything else to the helper. The underlying helper would assume you'll handle all that yourself, and just provide either spaces or newline-and-indent.

That should solve the problem you're describing for printing a tree, which could use the helper for optional-indentation. (Though, in the absence of separators and bracketing, I'm not sure how the tree formatting would work in non-pretty-printing mode...)

That said, the style you're describing there for Vec<Vec<i32>> would be a completely different pretty-printing style, and not one that I'm sure could be commoned up with a helper.

This is something that's in extreme danger of falling into a bikeshed of overgeneralization of capabilities, and I think the only way it's likely to happen is keeping the scope reasonably narrow.

By way of the kind of thing that probably shouldn't be commoned up into exactly the same helpers, consider the problem of printing trees with lines in addition to indentation (e.g. the problem termtree solves).

That said, one advantage of trying to turn this into a common set of helpers would be the ability to introduce additional modes with changes in Formatter that automatically work with every Debug impl using these helpers. We just have to be cautious to avoid introducing too much generality in the process.

3 Likes

I see, keeping the scope narrow makes a lot of sense. I think a reasonable design goal is to produce something that could be used to express all the existing debug_X helpers, as well as the general tree case I had in mind at the beginning, and stopping roughly at that point?

It seems to me that would entail having a begin marker, an end marker, a separator, would suffice for defining all the existing helpers. For instance, I can imagine a direct mapping from debug_struct to debug_level as follows.

debug_struct("Foo").field("a", &a).field("b", &b).finish()
=> debug_level().begin("Foo {")
                .end("}")
                .separator(",")
                .inner(|fmt| write!(fmt, "a: {}", &a)
                .inner(|fmt| write!(fmt, "b": {}, &b)
                .finish()

I would implement my tree printer as

match t {
   T(l, k, r) => debug_level().begin(format!("<{}>", k))
                              .inner(|fmt| write!(fmt, "{}", l)
                              .inner(|fmt| write!(fmt, "{}", r)
                              .finish()
   E => debug_level.begin("[_]").finish()
}

If we’re adding a general indentation helper, then it should not take individual items like debug_struct() and friends do, or have starting and ending strings explicitly provided. The indenter must automatically insert indentation after all newlines, as is currently done; otherwise, children not already using this system would be incorrectly unindented. Therefore, the indenter doesn’t also need to know whether an item starts or ends.

To expose indentation functionality, only one new API function is needed: providing a wrapper that adds indentation, and the rest can be left to the caller. For example, this would emulate debug_struct() (without adapting to alternate mode):

fn fmt(f: &mut fmt::Formatter<'_>) -> fmt::Result {
    write!(f, "Example {{")?;
    writel!(f.indented(4), "\nfoo: 1\nbar: 2")?;
    write!(f, "\n}}")?;
}

Of course, it would also be useful to have something that emulates debug_struct() and friends’ choice of single-line vs multi-line printing while allowing alternative delimiters, but this shouldn’t be baked into the API for doing indentation at all (unless it's easy to make that part do nothing).

4 Likes

The part that isn't clear to me with this API is how does the nested f.indented(_) calls stack on top of each other? If I implemented the following printer:

fn fmt(f: &mut fmt::Formatter<'_>, t: Tree) -> fmt::Result {
    match t {
        T(l, k, r) => {
            write!(f, "{}", k)?;
            write!(f.indented(4), "{}", l)?;
            write!(f.indented(4), "{}", r)?;
        }
        E => write!(f, "[_]")
    }
}

Do the writers of l and r first compute their own buffers, and then add the indent at the beginning? Wouldn't that be inefficient? This might also be due to my lack of knowledge of the internals of the fmt, I apologize if that's the case.

I'm not sure if begin/end/separator are required for the generalized version.

You could handle the separator by having the individual line displays print it (or not, if they want to do something different).

You could handle the beginning and end by printing them before and after the overall invocation.

f.write_str("Foo {")?;
f.debug_bikeshed()
    .line(|f| {
        thing.fmt(f)?;
        f.write_char(",")?;
        Ok(())
    })
    .finish()?;
f.write_str("}")?;

@kpreid I agree that we could have an indentation wrapper that doesn't need to handle individual items, but if we also wanted a wrapper that can handle space-separated non-pretty-printed output vs newline-and-indented pretty-printed output, I think that would need to be fed each item.

No buffers are getting pre-created here. f.indented(4) returns a new implementation of Write that, before writing a line, prints 4 spaces of indentation. If l and r themselves do indentation, they stack another Write wrapper to do another level of indentation. The result turns into a series of Write calls. The indirection may produce a bit of overhead, but it's debugging code, a little overhead is okay as long as it doesn't do a bunch of memory allocations, which this doesn't.

2 Likes

And if indentation support were built into Formatter, then additional levels could be just increasing a number instead of adding more adapters, thus executed without extra layers of dynamic dispatch.

(From a computational complexity perspective, note that indentation is O(N·M)[1] simply due to the number of characters output, so the implementation can’t do better than that.)


  1. N = number of original characters; M = number of levels of indentation ↩︎

I recently looked at the implementation of debug_struct + co. to figure out how it's done there. Ended up being surprised when it turned out that there is no such functionality and the stdlib instead wraps Formatter.buf in a struct that searches for newlines in write_str.

It isn't even implemented in a way that can be easily optimized and leads to a bunch of pointer chasing (even in release mode) and most likely iterating over the same data, searching for newlines multiple times.

pub struct Foo<T> {
    pub value: T,
}

impl<T: std::fmt::Debug> std::fmt::Debug for Foo<T> {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.debug_struct("Foo").field("value", &self.value).finish()
    }
}

fn wrap<T>(inner: T) -> Foo<T> {
    Foo { value: inner }
}

pub fn main() {
    let value = wrap(wrap(wrap(wrap(wrap(wrap(wrap(wrap(5))))))));
    println!("{value:#?}");
}

Excerpt from gdb (breakpoint + x/32 f.buf.pointer):

debug_indent::{impl#0}::fmt<debug_indent::Foo<debug_indent::Foo<debug_indent::Foo<debug_indent::Foo<i32>>>>> (self=0x7fffffffe15c, f=0x7fffffffdd10)
0x7fffffffde30:	0xffffdf10	0x00007fff	0x555ad230	0x00005555
debug_indent::{impl#0}::fmt<debug_indent::Foo<debug_indent::Foo<debug_indent::Foo<i32>>>> (self=0x7fffffffe15c, f=0x7fffffffdc30)
0x7fffffffdd50:	0xffffde30	0x00007fff	0x555ad230	0x00005555
debug_indent::{impl#0}::fmt<debug_indent::Foo<debug_indent::Foo<i32>>> (self=0x7fffffffe15c, f=0x7fffffffdb50)
0x7fffffffdc70:	0xffffdd50	0x00007fff	0x555ad230	0x00005555
debug_indent::{impl#0}::fmt<debug_indent::Foo<i32>> (self=0x7fffffffe15c, f=0x7fffffffda70)
0x7fffffffdb90:	0xffffdc70	0x00007fff	0x555ad230	0x00005555

It clearly shows the pointer chasing through a bunch of &mut dyn Write.

Yes, this is unlikely to end up in a hot path, but I think this shows that there really should be a better way to do indentation when formatting.

I think that's also what you mean here.

3 Likes

I see, I had the wrong mental model. Which way should we lean then, the indentation wrapper or making indentation a first class citizen?

The main question would be whether "making indentation a first-class citizen" causes any overhead in the formatting of cases that aren't doing debug pretty-print indentation. If it could be done with zero overhead in those cases, sure; if it adds any overhead to other formatting then it needs to be kept contained, which the wrapper approach does.

2 Likes

Wouldn't the overhead be a single comparison against 0 to see if it needs to search for newlines?

If we use the indentation wrapper it should probably be in a way that only requires one wrapper and not one wrapper per indentation level (not sure if that's possible without a breaking change).

I'm wondering whether the correct option here involves a "write newline and indent" method on the formatters – that has zero overhead in most cases, and might be sufficient to handle both standard and custom indentations.

I think that might be possible, with some careful internal API design to be able to find out when already about to set up a wrapper if there's already a wrapper in place and if so modify it in a scoped fashion rather than making a new one.

It will most likely increase code size. And that has been a sore point in embedded Rust, since the panic machinery easily pulls in formatting even if you don't want it.

So I'm highly sceptical of adding more features to formatting before we have build-std and the ability to select features such as opting for a much more slimmed formatting system.

Has there been any progress on that (internal changes or on how those features would be selected)?