Completing rustfmt and the Rust style guidelines

A valid position, although I prefer import one-liners. A kind of a minus is that removing/adding imports may move your lines and cause a problem elsewhere when comparing. I think Git is probably smart enough to avoid such cases, but unsure of other VCS.

Regarding the formatting of imports: right now i’m expanding all instances of curly braced blocks. I know that the more common way is to put them in one line, but I hadn’t come to “fix” it yet.

And i must say, i find the expanded version more readable.

In the Ruby world, I’m a big fan of RuboCop. All of the style configurations are tunable, so you can tweak them to your heart’s content. It still enforces a consistent style per-project. This lets the RuboCop project pursue the “one true style”, while opinionated projects using RuboCop can configure stylistic overrides. I think it seems to work well.

This lets the RuboCop project pursue the "one true style", while opinionated projects using RuboCop can configure stylistic overrides. I think it seems to work well.

Do we have opinionated projects that would insist on their own styles?

IMO (!) the more universal a style is the more valuable it is to the community, and allowing customization is a tradeoff to get long existing projects on board.

2 Likes

On the Python projects that I work on, our convention is one name per line with a trailing comma in imports, just as the example linked to, for any imports of more than one name from a module. It makes it much easier to view diffs, as adding or removing an import only affects that one line, and you don’t run into issues with having to reformat the whole thing when adding something into the middle of it, or having to switch between one line and multi-line formats when it gets too long.

What I would generally prefer from a tool, however, is to leave line breaking decisions as-is unless required to wrap to the desired length. If I’m working on a module that has:

use syntax::parse::lexer::{StringReader, TokenAndSpan, Reader};

and I want to run a formatter on it to fix up some other style problems later on, I don’t want it to introduce the change to this format:

use syntax::parse::lexer::{
    StringReader,
    TokenAndSpan,
    Reader,
};

As far as formatting goes, there are basically some rules that I consider mandatory (line length, spaces being necessary in certain places and forbidden in others, etc), and some that are more like suggestions and a matter of judgement, such as precisely whether to split a line that is not over the limit, or where to split it. For example, if you have more than a couple of positional arguments to a function, it can be nice to just split it so each is on a separate line, even if it wouldn’t go over the length limit; I wouldn’t want the formatter to put it back on one line for me. Likewise, in this case, while I prefer the one import per line style, I don’t want the formatter introducing that change (unless the line exceeds the maximum length).

Overall, rustformat is looking like a pretty good start. Noticed one problem when I tried it; it added trailing whitespace after commas when I ran it on itself:

    Modified   src/token_handling.rs
diff --git a/src/token_handling.rs b/src/token_handling.rs
index 97bfa70..409c32f 100644
--- a/src/token_handling.rs
+++ b/src/token_handling.rs
@@ -179,12 +179,12 @@ impl Word {
                     token::Float(c) => c.as_str().to_string(),
                     token::Integer(c) => c.as_str().to_string(),
                     token::Str_(s) => format!("\"{}\"", s.as_str()),
-                    token::StrRaw(s, n) => format!("r{delim}\"{string}\"{delim}",
-                        delim = repeat("#", n),
+                    token::StrRaw(s, n) => format!("r{delim}\"{string}\"{delim}", 
+                        delim = repeat("#", n), 
                         string = s.as_str()),
                     token::Binary(v) => format!("b\"{}\"", v.as_str()),
-                    token::BinaryRaw(s, n) => format!("br{delim}\"{string}\"{delim}",
-                        delim = repeat("#", n),
+                    token::BinaryRaw(s, n) => format!("br{delim}\"{string}\"{delim}", 
+                        delim = repeat("#", n), 
                         string = s.as_str()),
                 };
                 if let Some(s) = suf {

Likewise, in this case, while I prefer the one import per line style, I don't want the formatter introducing that change (unless the line exceeds the maximum length).

For blocks in parens or brackets i made the rule that a newline after the opening brace means that the whole thing should be expanded, and no linebreak after the opening brace means everything should be on one line (when no other boundaries are violated)

For example

let foo = [a, b, c]

stays

let foo =  [a, b, c]

while

let foo = [
a, b, c]

becomes

let foo = [
    a, 
    b, 
    c,
]

This was a little hack to format lists with complex and big elements in a not totally horrible way, and it worked surprisingly well. Maybe this rule or a modification thereof could be used for curly braced blocks. It could strike a good balance between control and external code transformation.

Noticed one problem when I tried it; it added trailing whitespace after commas when I ran it on itself:

I will look into that. Thanks a lot!

In general: I've have the feeling that my little prototype is too simplistic and hacky in the long run. And while I must confess that I don't really "get" the rust AST and all the mechanics around it I have the feeling that @pcwalton has a really good point with his opinion that the compiler AST is not a good fit.

Has anybody links to literature/docs/blogposts to fuzzy parsing for pretty printing? Especially in the context of clang format?

p.s. I just wanted to thank everyone for the kind words. I've anticipated a lot more flak for trying to participate as a newcomer...

1 Like

That does sound like a reasonable rule of thumb for when to use each style, and it would cover most of the cases I can think of.

The other possibility is to just never add newlines unless to wrap to within the length limit, only adjust the indentation and within-line whitespace; I think that's what a lot of formatters do.

As a point of reference, here’s what Microsoft’s C# formatter does with methods:

// 1 unformatted
int Roll() { var i = 4; return i; }
// 1 formatted
int Roll() { var i = 4; return i; }

// 2 unformatted
int Roll() { var i = 4; return i;
}
// 2 formatted
int Roll()
{
    var i = 4; return i;
}

// 3 unformatted
int Roll() { var i = 4;
return i; }
// 3 formatted
int Roll()
{
    var i = 4;
    return i;
}

// 4 unformatted
int Roll() { if(true) { return 3;
} else { return 5; } }
// 4 formatted
int Roll()
{
    if (true)
    {
        return 3;
    }
    else { return 5; }
}

I like how it works, except the second example should have the two statements on separate lines since the braces are. If I want to convert from a single line block to multiple lines, I like that I can just insert a newline like in example 4 and Visual Studio will insert six more newlines for me (when I reformat the method).

1 Like

As a point of reference, here's what Microsoft's C# formatter does with methods

Thanks for the reference. Visual Studio is a battle proven piece of software, and surly worth a look or two.


I would like to start the next iteration of the formating tool. This would be creating an simplified ast that contains the information needed for pretty printing (and see how the notion of a logical line fits into that)

I'm trying to familiarize myself with the rust grammar, but the beast is nearly 2k lines long... i guess rust is not the ideal language to write "my first pretty printer" for :stuck_out_tongue:

2 Likes

I think stylistic debates are inevitable. Should you insist that people always linewrap their code at 80 columns? Wars have been fought over less

Agreed. I would definitely like to use the tool, but there are a couple of points where the style guide differs from my preferences and existing projects to the point that I wouldn't use the tool if it forced those points on me. This would be unfortunate, since it would prevent me from using the tool to check/fix other styling issues, and it'd be nice to have my code be as consistent as possible aside from those points.

I think this is the way to go. This way, consistencies won't pop up because different contributors have different configurations or accidentally specify different command-line options. If each project had its rustfmt options in its repository, it would make it easy for contributors to make sure their code matched the project's style (which could differ from the defaults for a variety of reasons). If such a configuration were to become standardized, other tools could read it, too. For example, the Vim plugin for rust could use it to configure its auto-indenting behavior, and future IDEs could use it to automatically configure their own auto-indent and auto-formatting features for the project. Between this and Cargo, any IDE should be able to open any Rust project and do the right thing, which would be fantastic.

I have also been trying my hand at a rust formatter: https://github.com/sp0/rustfmt

It follows the structure of clang-format pretty closely. Essentially it splits the token stream up into lines, and then uses a penalty based line breaking algorithm to format each line separately (using a variant of dijkstra’s algorithm to find the path with least penalty). It seems to work decently so far (when it doesn’t decide to completely break your code), but it has a long way to go before being usable.

Feedback is welcome.

Very cool. I’ll had a few stressful days, so I didn’t get to test it or take a closer look.

Is it a 1 to 1 translation of clangformat? How do you handle ambigious tokens like <, >, *, - etc.? If i understand it correctly there is no real parsing process involved.

Nah, it’s not a 1 to 1 translation of clang-format. I just followed the general structure, naming, algorithms (splitting tokens into lines, annotating the lines/tokens, and then the line breaking algorithm using split penalty etc.). It disambiguates <, >, * by parsing each line (after the tokens have been divided into lines) and annotating tokens using its context. The token annotation code is pretty terrible atm, I will probably have to rewrite it.

I am currently working on an expression parser, to give binary operators different penalties depending on their precedence, and avoid splitting precedence groups in large expressions. Well see how it goes :smiley:

Are you interested in joining forces? I planned (resp. started) to rewrite my formater anyway, and your codebase seems to be a good base for futher development.

Sure, I sent you a message.

I just wanted to give a quick update.

I’ve abandoned the work on my project completely to concentrate on sp0s version. The work there goes along quite nicely. Currently major pain point is the whole macro shebang and various corner cases.

Since the worst case scenario would be mangled code we test against the code examples from rust-by-example if they still compile after formatting. Everyone is invited to PR valid rust files to /src/test_src/unformatted/ to create a greater coverage :stuck_out_tongue: (or test the tool and bury us in bug reports)

Our goal is to have it in a somewhat usable state by the rust 1.0 release. Lets see how that goes…

The next question than would be how it should be distributed. Clearly it has to be usable as a library so it can be used from IDE plugins, but should it also be bundled as a standalone tool with the rust installation? Or even be integrated into cargo?

1 Like

I would support pulling it up into the main distribution, for sure.

2 Likes

:smile: Thats great.

Hi all, my experiments with rustfmt (using an AST approach) are now useful and I’d love to have some help making it awesome. I wrote up a bit on my blog about it - http://featherweightmusings.blogspot.com.tr/2015/04/rustfmt-call-for-contributions.html Or if you’d prefer, go straight to the repo.

5 Likes