Crate evaluation for 2017-06-13: walkdir

Thanks for leading this @KodrAus.

I added some cookbook suggestions, and evaluated a bit of the checklist, added some discussion points and documentation issues.

This is another crate where serde doesn’t seem particularly appropriate.

I continue to be unclear on how hard we should try to put examples on every method, like std does. This crate seems pretty well-exampled to me, and the methods are pretty clear about what they do. It doesn’t seem vital that every method here has an example. Should we file issues for each of them and try to get examples, or just let it be good enough as is?

1 Like

@KodrAus do you mind listing in the op potential issues to file against walkdir for the guidelines you’ve encountered that it doesn’t fulfill? I worry that it’s hard to translate after the fact the "n"s to issues.

Thanks @brson, @birkenfeld and @omh1280 for getting started on the evaluation!

@brson I also think it’s good enough as it is, it’s a special use-case crate with good examples on its entry-points.

A pain point @burntsushi raised with walkdir is the WalkDirOptions.sorter field, which is essentially a FnMut(&OsString,&OsString) -> Ordering + 'static. This currently prevents WalkDir and Iter from being Send, because that closure isn’t Send.

If we had an assert_send! macro this might’ve been easier to pick up, but since it was raised as an issue while thinking about rayon support post 1.0 I’m guessing it’s a great nice-to-have but hasn’t actively blocked walkdir users. But it’s a bit unfortunate that missing Send on one type cascades through all the types that use it and prevent them from being Send too.

Maybe a good API recommendation is for producers to require Send on their generic closures, unless they’ve got a reason not to.

EDIT: Reflecting on this a bit more I think the issue is that concrete structures are all Send until they’re not (you have a non-Send field), but traits/generic types are not Send until they are (you add a Send bound). And you as the API producer might not be the one that needs to Send your types. I’m sure it’s come up before, but interesting to think about its implications on API design :slight_smile:

walkdir can report files in a sorted order.

WalkDir::new("foo").sort_by(|a,b| a.cmp(b));

This is convenient when you want to compare directories. A nice cookbook entry could be to print a diff between two directories.

1 Like

came here via reddit call for examples - I've got a simple one:

2 Likes

I tried to synthesize some of the cookbook recipe ideas here, adding one new one to the OP, but mostly asking questions about how to show off specific features of walkdir.

1 Like

@burntsushi Does walkdir solve any tricky cross-platform problems that would make a good example, maybe with symlinks or something?

The one I can think of is that if you enable walkdir to follow symlinks, then it will detect loops and return an error if it finds one. (This is a cross platform thing because of symlinks (which is handled by std), but also for determining whether two file paths refer to the same file.)

@brson It looks like there are a few differences with filter_entry vs Iterator.filter_map/Iterator.filter in behaviour and input. I think the discussion point is still valid to investigate if we can make the iterator implementations behave the same way and if that’s worthwhile.

x-posted from https://github.com/brson/rust-cookbook/issues/112 Please note that filter_entry changes the semantics over filter which might be surprising and unwanted by the user (we might want to filter over files but descend to all dirs or filter dirs with separate predicate). These two would coexist. But I kinda find the filter_entry name braking the rule of least surprise but I’m unable to suggest a better name.

@budziq That's a valid point. I think we should reach for filter_entry over filter / filter_map unless we don't want to skip directories, but the case of .filter_map(Result::ok) seems common.

One of the quirks of filter_entry is that it has to be the first adapter. I think this is a good thing, and I think the docs could try tease apart filter / filter_map vs filter_entry a bit more. I'll make a note of this in the cookbook thread too.

@budziq I’ve added a discussion point about filter_entry behaving in a way that isn’t obvious from its name.

The point about filter being better for matching files and still traversing directories is a good one, but I’m still inclined to prefer filter_entry and exclude directories from the predicate. That’s clearer to anyone reading the code that it’s only acting over files. This case could be made more ergonomic.

I’m not really sure about [C-NEWTYPE] and [C-CUSTOM-TYPE] here with depth parameters as usize but it might be another trade-of for ergonomy over type safety

Yeh I think usize is fine too. Using them for the traversal depth here just seems like an extra mental hurdle for someone to jump.

Also it is worth pointing out that https://docs.rs/walkdir/1.0.7/walkdir/struct.DirEntry.html has slightly different api than

https://doc.rust-lang.org/std/fs/struct.DirEntry.html

Especially the file_type for std::fs returns Result while Walkdir’s does not. That particular difference is also not mentioned in the docs.

Should there be an API guideline for types mimicking symbols from std?

1 Like

It is, but it could be clearer, as it's not particularly direct about it:

All recursive directory iterators must inspect the entry's type. Therefore, the value is stored and its access is guaranteed to be cheap and successful.

When you walk a directory recursively, you have to inspect the entry's type to determine whether it's a directory or not. Therefore, since that information is already available, walkdir can provide it to the caller as is without another stat call.

1 Like

Right, thanks! It’s not obvious on first glance.

We’re going to kick off the review of the walkdir evaluation soon. I’m copying the the discussion points into the etherpad as the agenda

1 Like

This was a really productive review. Excited to see the deluge of issues it creates. Thanks @KodrAus.

1 Like

Thanks @brson for your assistance! I’ll start logging issues for walkdir and is_same_file later on today.