On DebugTuple and {:#?}

I'm sure everyone who's used {:#?} or dbg!() has noticed...

Goto(
    Address(
        30016,
    ),
),
Label(
    Address(
        29990,
    ),
),
Expr(
    Expr(
        Expr(
            [
                Var(
                    0,
                ),
                Const(
                    0,
                ),
                Op(
                    Ne,
                ),
            ],
        ),
    ),
    Address(
        30016,
    ),
),

...It's not very pleasant to read. There's a lot of wasted vertical space.

Until a few years ago, I often used to do manual impls that write out the same thing as the derived would, but without DebugTuple to avoid the wasted space. But that's a lot of boilerplate and doesn't work for things like Option. After that, I developed the second most cursed crate I know of (after cve-rs): compact-debug, which literally patches the stdlib in memory.[1] With this enabled, the above data prints as:

Goto(Address(30016)),
Label(Address(29990)),
Expr(Expr(Expr([
    Var(0),
    Const(0),
    Op(Ne),
])), Address(30016)),

which, in my opinion, is way more digestible. Despite its cursedness, I've found myself using compact-debug in almost every project I've done since, it's just so much better.


So, with the background out of the way. Is there any possibility to change this in a future version? And has there been any past discussions on similar topics?

Code-wise, the changes are trivial: just remove the is_pretty checks from DebugTuple and it works out fine.[2] I would be happy to make a PR for this, should people be in favor. No idea how a change like this would work with stability attributes and the like, though.

The bigger problems are the ecosystem effects: first of all, would this change be desirable at all? I think it very much is, I find dbg!() almost unusable without it, but opinions may differ.

Second, does the ecosystem depend on the formatting? Debug does explicitly state that the exact output format is not stable, but I would be surprised if there aren't a few test suites that check the exact formatting.


  1. It's very much platform specific and a pain to maintain since the internal APIs and the compiled machine code have no stability guarantees. But that's beside the point. ↩︎

  2. In my experience the DebugLists/Structs/Maps give sufficient vertical space to make it nice and readable for almost all workloads, with the exception of a few cases that resemble linked lists. ↩︎

12 Likes

For what it's worth, on nightly it is now possible to opt out of pretty-printing (and otherwise manipulate the formatting options):

playground

#![feature(formatting_options)]

struct Foo(i32, f32);

impl Debug for Foo {
    fn fmt(&self, fmt: &mut Formatter) -> Result {
        let opt = *fmt.options()
            .alternate(false);

        fmt.with_options(opt)
            .debug_tuple("Foo")
            .field(&self.0)
            .field(&self.1)
            .finish()
    }
}

Personally, with my libs-api hat on, I'd love to see this fixed in the standard library. Concretely, I think at the very least, tuple structs with just one field (e.g. Goto(Address(30016))) don't need to be spread across three lines.

20 Likes

@jdahlstrom That is a nice capability, but it doesn't really help here - it once again requires a lot of boilerplate, and from what I understand it would disable pretty on the whole subtree, but keeping structs/lists on multiple lines is pretty nice IMO.

@josh I agree, that was my first thought too. But I don't think doing it only for single-item tuples is possible without changing the API, since you'd either need a size hint of some sort, or you'd need to keep the first tuple item around so you can choose behavior based on whether there's a second. (Btw Goto(Address(30016)) is currently printed on five lines, not three.)

1 Like

That's two levels of tuple tho.

I think we could add a new single_field method on DebugTuple that acts like .field(...).finish(). The Debug derive macro could be changed to use it, which would cover most cases. We could even add a clippy lint if we want the manual impls changed to use it.

10 Likes

Really what I wish happened was that we emitted things so that it tried to fit it in, say, a [u8; 60] without the wrapping, then if that ran out of space redid it with the indented form. But we'd have to be super careful that that wouldn't go exponential.

(And people who write bad Debugs would complain about it getting called twice, but I don't care about them.)

There's already an internal function for the 1-field case:

(It improved compile-time.)

4 Likes

I don't think dbg! should line break at all, at least by default. Typically I want to debug values in nested loops or functions, and there I want the output to be aligned 100% of the time so I can notice patterns; e.g.

[src/main.rs:12:14] foo = Foo { a: 7, b: 10, order: Some(Less) }
[src/main.rs:12:14] foo = Foo { a: 7, b: 1, order: Some(Greater) }
[src/main.rs:12:14] foo = Foo { a: 7, b: 2, order: Some(Greater) }
[src/main.rs:12:14] foo = Foo { a: 7, b: 7, order: Some(Equal) }

I'm exaggerating about 100% of the time, but I do think it's at least 95% preferable over this (especially when you're debugging more than just one value):

[src/main.rs:12:14] foo = Foo {
    a: 7,
    b: 10,
    order: Some(
        Less,
    ),
}
[src/main.rs:12:14] foo = Foo {
    a: 7,
    b: 1,
    order: Some(
        Greater,
    ),
}
[src/main.rs:12:14] foo = Foo {
    a: 7,
    b: 2,
    order: Some(
        Greater,
    ),
}
[src/main.rs:12:14] foo = Foo {
    a: 7,
    b: 7,
    order: Some(
        Equal,
    ),
}
3 Likes

I actually implemented exactly this behavior for an unrelated project a couple years ago, and was tickled to discover that it doesn't go exponential because (1) you can reasonably assume that the formatting code only has a constant cost per character written, and (2) each "try it as a one-liner" attempt can only write a constant amount of characters. (I did it by making a LimitedLengthString type that implements Write in a way that always errors if it would go over the limit; you sure wouldn't want to use the approach of "format the entire subtree and only fail at the end if it was too long", which would make it exponential. But the Write API made the fail-early approach easy.)

1 Like

Doing width-based prettyprinting sounds rather overkill - that would require embedding tables for unicode-width, calling the traits multiple times (bad for itertools::format), and would be difficult to implement efficiently, especially in no-std. I don't think that would be worth it. I have toyed a little with the idea to have fmt::Debug go via some sort of visitor approach instead of a fmt::Write, though, which would allow things like that and probably other cool almost reflection-like things. But that'd be a much bigger proposal than what I'm suggesting here.

I agree that fmt!() being multi line isn't always desirable, but I think it's common enough that the current behavior is fine.

2 Likes

I disagree. I often print very large structs that would be completely unreadable on a single line. And these are one-off prints, not multiple looking for a pattern. We seem to have completely opposite use cases.

3 Likes

seems like a maybe good solution is to just have a buffer that the writer stores stuff in and if it runs out of space or hits a newline, then it switches to multi-line mode, and then writes the contents of the buffer and the rest of the current and future writes directly to the wrapped writer with indentation.

this avoids needing to format anything twice.

enum State<'a> {
    SingleLine {
        buf_used: usize,
        buf: &'a mut [u8],
    },
    MultiLine {
        needs_indent: bool,
    },
    Failed,
}

struct MaybeSingleLine<'a, W> {
    writer: W,
    state: State<'a>,
}

impl<'a, W: fmt::Write> MaybeSingleLine<'a, W> {
    fn new(writer: W, buf: &'a mut [u8]) -> Self {
        Self { writer, state: State::SingleLine { buf_used: 0, buf } }
    }
    /// returns Ok(true) if it was able to write everything in a single line
    fn finish(self) -> Result<bool, fmt::Error> {
        let Self { mut writer, state } = self;
        match state {
            State::SingleLine { buf_used, buf } => {
                let buf = str::from_utf8(&buf[..buf_used]).expect("known to be utf-8");
                writer.write_str(buf)?;
                Ok(true)
            }
            State::MultiLine { .. } => Ok(false),
            State::Failed => Err(fmt::Error),
        }
    }
}

impl<W: fmt::Write> fmt::Write for MaybeSingleLine<'_, W> {
    fn write_str(&mut self, s: &str) -> fmt::Result {
        let state = mem::replace(&mut self.state, State::Failed);
        let (buf, mut needs_indent) = match state {
            State::SingleLine { buf_used, buf } if buf.len() - buf_used <= s.len() && !s.contains('\n') => {
                buf[buf_used..][..s.len()].copy_from_slice(s.as_bytes());
                self.state = State::SingleLine { buf_used: buf_used + s.len(), buf };
                return Ok(());
            }
            State::SingleLine { buf_used, buf } => {
                let buf = str::from_utf8(&buf[..buf_used]).expect("known to be utf-8");
                (Some(buf), true)
            }
            State::MultiLine { needs_indent } => (None, needs_indent),
            State::Failed => return Err(fmt::Error),
        };
        for s in buf.into_iter().chain(s.split_inclusive('\n')) {
            if s == "\n" {
                needs_indent = false;
            }
            if s.is_empty() {
                continue;
            }
            if needs_indent {
                self.writer.write_str("    ")?;
            }
            self.writer.write_str(s)?;
            needs_indent = s.ends_with("\n");
        }
        self.state = State::MultiLine { needs_indent };
        Ok(())
    }
}

But if it does conclude it needs multi line mode, it still needs to run the fmt function again, no? So I don't see how that would avoid double format.

no it doesn't need to rerun the fmt function, it just continues running the fmt function it's already part-way through, it already was storing all output in buf, so it just writes buf (after indenting) to the underlying writer and then continues -- after this point, whatever fmt tries to write to it, it now just writes directly to the underlying writer (after indenting) without storing it in buf.

So, changing that function to always not use alternate printing, but propagate the alternate option into the fields, then.

+1 to having multiple lines for struct with more than ~5 fields (or more than 1 field as discussed upthread, to draw a line somewhere).

I definitely want to keep the current multi-line behavior when using a line-based diff tool to analyze differences in nested structs with dozens of fields each.

2 Likes