Crate evaluation for 2017-06-13: walkdir

KodrAus · May 25, 2017, 10:36pm

Crate evaluation for 2017-06-13: walkdir

For additional contribution opportunities, see the main libz blitz thread.

This post is a wiki. Feel free to edit it.

Links

Needs your help!

Anything that is not checked off still needs your help! There is no need to sign up or ask for permission - follow any of these links and leave your thoughts:

Guidelines checklist

Legend

[y] = guideline is adhered to, no work needed.
[n] = guideline may need work, see comments nearby
[/] = guideline not applicable to this crate

Checklist

Guidelines checklist

This document is collaboratively editable. Pick a few of the guidelines, compare the walkdir crate against them, and fill in the checklist with [y] if the crate conforms to the guideline, [n] if the crate does not conform, and [/] if the guideline does not apply to this crate. Each guideline is explained in more detail here. If [n], please add a brief note on the following line with an explanation. For example:

  - [n] Crate name is a palindrome (C-PALINDROME)
        - walkdir backwards is ridklaw which is not the same as walkdir

Cookbook use case statements

Cookbook example ideas

Come up with ideas for nice introductory examples of using walkdir, possibly in combination with other crates, that would be good to show in the Rust Cookbook. Please leave a comment in that issue with your ideas! You don’t necessarily have to write the example code yourself but PRs are always welcome.

API guideline updates

API guideline ideas

What lessons can we learn from walkdir that will be broadly applicable to other crates? Please leave a comment in that issue with your ideas!

Discussion topics

Already stable

walkdir is already stable, so we should try find non-breaking solutions, but @burntsushi isn’t against a 2.0 release if there’s enough in it.

Follow-up

Publish 2.0

Links to types in `std`

There are a lot of references to types in std::fs. Should these be links to the relevant section in the std API docs? If so, where should they point?

Follow-up

Link std references to https://doc.rust-lang.org/stable/std/
Ping @steveklabnik about the stable/std path for std docs. Will this change?

Custom iterator adapters

WalkDirIterator::filter_entry is not obviously better than into_iter().filter_map.

filter_entry behaves differently to filter in a way that isn’t clear from the function signatures.

Are there expectations on what an Iterator adapter should look like?

Predicate input as Self::Item or &Self::Item: filter_entry takes &DirEntry instead of Result
Chainable with any other adapter: filter_entry is only available on WalkDirIter

Could we solve this with a new name/docs/examples:

Maybe rename to filter_walk? It might be helpful to think of it as a driver for the walk that always comes before filter, operating on an already determined set of entries
Example for use-cases that exploit the file structure (and may want to filter_entry)
Example for use-cases that ignore the file structure (and wouldn’t want to filter_entry)

Or make filter_entry a method on WalkDir instead of an adapter?

Follow-up

Make WalkDirIterator methods inherent on Iter and IterFilterEntry
Deprecate WalkDirIterator trait in favour of inherent methods
Remove WalkDirIterator trait in 2.0
Add doc example for filter_entry vs filter why choose one over the other

Should `WalkDirIterator` be externally implementable?

The WalkDirIterator is implemented by a few types internally, but doesn’t seem to have any value being implemented externally. Should it have a Sealed supertype to signal that it’s not for implementing?

Follow-up

Irrelevant since we’re removing the trait

Ensuring Send/Sync

Determining send/sync still hard. WalkDirOptions is not Send, which prevents WalkDir and Iter from being Send. Maybe we could create assert_send!, assert_sync! macros and apply them to every type.

Should we strengthen recommendation to assert types are send/sync by default?

Follow-up

Make sort_by function Send + Sync
Add assert_send / assert_sync macros and test:
- WalkDir
- Iter
- IterFilterEntry
@KodrAus to open a Pre-RFC about asserting Send / Sync in std

Error conditions for iterators returning results

Where should Iter::next's Error conditions be documented if Iter::Item is a Result? On the impl of next? On the iterator itself?

Follow-up

Add Errors section to Iter and IterFilterEntry type docs
@KodrAus to file guideline issue about Error sections for trait methods returning errors
@KodrAus to file issue about rust-doc calling out overridden trait docs more

Verified unwrapping

walkdir::Iter.next() calls unwrap on a Vec that’s guaranteed not to panic by the block it’s in. Should this be expect with a comment describing why it can’t fail?

Could any of these be refactored to avoid the unwrap altogether? All methods calling unwrap that are validated:

<Iter as Iterator>::next
Iter::get_deferred_dir
Iter::push

Should there be a guideline for this?

Follow-up

Change unwrap to expect and document why they won’t panic
Open guidelines issue about unwrap vs expect

Re-exporting unstable `is_same_file`

Should the is_same_file function be re-exported from the unstable same_file crate? It isn’t mentioned in docs or used in examples. Looks like it was pulled into a new crate post 1.0 and re-exported to prevent a breaking change.

Follow-up

Deprecate re-export in favour of is_same_file crate
Remove re-export in 2.0

Sorting entries

Could sorter support more metadata to compare like size, last modified? It should at least use OsStr instead of &OsString.

Follow-up

Open issue to support sorting by more file metadata (might not be feasible)
At least change &OsString args to OsStr

Platform-specific methods

Raised during the meeting that DirEntry::ino is a unix only method, but that isn’t reflected in the docs.

Follow-up

Add unix::DirEntryExt trait and add ino method there
Remove inherent ino method from DirEntry
Document that DirEntryExt is unix only

Crate issues

[details=Summary]- Add Error documentation sections for

is_same_file
Iter / Iter::next - probably should be documented in both places? See discussion
IterFilterEntry / IterFilterEntry::next
DirEntry::metadata
docs: Indicate that DirEntry::ino is Unix only
docs: remove unwrap from
- examples in root
- examples on WalkDir
- examples on WalkDirIterator.skip_current_dir and filter_entry
docs: clarify differences between walkdir::DirEntry and std::fs::DirEntry more
- add examples for DirEntry methods
- demonstrate differences with std::fs::DirEntry in method examples
docs: add example for contents_first
docs: add links for:
- filter_entry in root skip hidden files and directories efficiently on unix example
- path, file_name and follow_links in DirEntry
- WalkDir::new in DirEntry::path
- follow_links in DirEntry::path_is_symbolic
- follow_links in DirEntry::metadata
- follow_links in DirEntry::file_type
- WalkDir in Iter
- WalkDir::min_depth in IterFilterEntry
- WalkDir::max_depth in IterFilterEntry
- WalkDir::filter_entry in WalkDir
- WalkDir::min_depth in WalkDir
- WalkDirIterator::filter_entry in WalkDirIterator::skip_current_dir
- WalkDirIterator::skip_current_dir in WalkDirIterator::filter_entry
- WalkDir::min_depth in WalkDirIterator::filter_entry
- WalkDir::max_depth in WalkDirIterator::filter_entry
docs: Add a note that IterFilterEntry is created by calling Iter::filter_entry
docs: Correct inaccuracies in WalkDir root docs: if contents_first is true then directories aren’t emitted before their contents. The order isn’t unspecified if a sort_by predicate is given
Add html_root_url attribute
Add Debug to Iter, IterFilterEntry, WalkDir
Add CI for OSX (?) and crate badges for tier 1 platforms
Add categories metadata to Cargo.toml
Add Debug impls for
- Iter
- IterFieldEntry
- WalkDir
Rename Iter to IntoIter
Rename IterFilterEntry to FilterEntry[/details]

Evaluation task assignments

[ ] @KodrAus assert_send / assert_sync Pre-RFC
[ ] @KodrAus follow up stability of /stable/std path to std docs
[ ] @KodrAus follow up rustdoc calling out overridden trait method docs

How are we tracking?

Pre-review checklist

Do these before the libs team review.

[x] Create evaluation thread based on this template
[x] Work with author and issue tracker to identify known blockers
[x] Compare crate to guidelines by filling in checklist
[x] Record other questions and notes about crate
[x] Draft several use case statements to serve as cookbook examples
[x] Record recommendations for updated guidelines

Post-review checklist

Do these after the libs team review.

[x] Publish libs team meeting video
[x] Create new issues and tracking issue on crate’s issue tracker
[ ] Solicit use cases for cookbook examples related to the crate
[ ] File issues to implement cookbook examples
[ ] File issues to update guidelines
[ ] Post all approachable issues to TWiR call for participation thread
[ ] Update links in coordination thread

brson · May 27, 2017, 7:43pm

Thanks for leading this @KodrAus.

I added some cookbook suggestions, and evaluated a bit of the checklist, added some discussion points and documentation issues.

This is another crate where serde doesn’t seem particularly appropriate.

I continue to be unclear on how hard we should try to put examples on every method, like std does. This crate seems pretty well-exampled to me, and the methods are pretty clear about what they do. It doesn’t seem vital that every method here has an example. Should we file issues for each of them and try to get examples, or just let it be good enough as is?

brson · May 27, 2017, 7:46pm

@KodrAus do you mind listing in the op potential issues to file against walkdir for the guidelines you’ve encountered that it doesn’t fulfill? I worry that it’s hard to translate after the fact the "n"s to issues.

KodrAus · May 27, 2017, 11:49pm

Thanks @brson, @birkenfeld and @omh1280 for getting started on the evaluation!

@brson I also think it’s good enough as it is, it’s a special use-case crate with good examples on its entry-points.

A pain point @burntsushi raised with walkdir is the WalkDirOptions.sorter field, which is essentially a FnMut(&OsString,&OsString) -> Ordering + 'static. This currently prevents WalkDir and Iter from being Send, because that closure isn’t Send.

If we had an assert_send! macro this might’ve been easier to pick up, but since it was raised as an issue while thinking about rayon support post 1.0 I’m guessing it’s a great nice-to-have but hasn’t actively blocked walkdir users. But it’s a bit unfortunate that missing Send on one type cascades through all the types that use it and prevent them from being Send too.

Maybe a good API recommendation is for producers to require Send on their generic closures, unless they’ve got a reason not to.

EDIT: Reflecting on this a bit more I think the issue is that concrete structures are all Send until they’re not (you have a non-Send field), but traits/generic types are not Send until they are (you add a Send bound). And you as the API producer might not be the one that needs to Send your types. I’m sure it’s come up before, but interesting to think about its implications on API design

vandenoever · June 1, 2017, 10:58am

walkdir can report files in a sorted order.

WalkDir::new("foo").sort_by(|a,b| a.cmp(b));

This is convenient when you want to compare directories. A nice cookbook entry could be to print a diff between two directories.

imbaczek · June 1, 2017, 8:55pm

came here via reddit call for examples - I've got a simple one:

brson · June 1, 2017, 11:42pm

I tried to synthesize some of the cookbook recipe ideas here, adding one new one to the OP, but mostly asking questions about how to show off specific features of walkdir.

brson · June 1, 2017, 11:45pm

@burntsushi Does walkdir solve any tricky cross-platform problems that would make a good example, maybe with symlinks or something?

burntsushi · June 2, 2017, 12:34am

The one I can think of is that if you enable walkdir to follow symlinks, then it will detect loops and return an error if it finds one. (This is a cross platform thing because of symlinks (which is handled by std), but also for determining whether two file paths refer to the same file.)

KodrAus · June 2, 2017, 12:46am

@brson It looks like there are a few differences with filter_entry vs Iterator.filter_map/Iterator.filter in behaviour and input. I think the discussion point is still valid to investigate if we can make the iterator implementations behave the same way and if that’s worthwhile.

budziq · June 2, 2017, 1:16am

x-posted from https://github.com/brson/rust-cookbook/issues/112 Please note that filter_entry changes the semantics over filter which might be surprising and unwanted by the user (we might want to filter over files but descend to all dirs or filter dirs with separate predicate). These two would coexist. But I kinda find the filter_entry name braking the rule of least surprise but I’m unable to suggest a better name.

KodrAus · June 2, 2017, 1:50am

@budziq That's a valid point. I think we should reach for filter_entry over filter / filter_map unless we don't want to skip directories, but the case of .filter_map(Result::ok) seems common.

One of the quirks of filter_entry is that it has to be the first adapter. I think this is a good thing, and I think the docs could try tease apart filter / filter_map vs filter_entry a bit more. I'll make a note of this in the cookbook thread too.

KodrAus · June 2, 2017, 2:34am

@budziq I’ve added a discussion point about filter_entry behaving in a way that isn’t obvious from its name.

The point about filter being better for matching files and still traversing directories is a good one, but I’m still inclined to prefer filter_entry and exclude directories from the predicate. That’s clearer to anyone reading the code that it’s only acting over files. This case could be made more ergonomic.

budziq · June 2, 2017, 12:46pm

I’m not really sure about [C-NEWTYPE] and [C-CUSTOM-TYPE] here with depth parameters as usize but it might be another trade-of for ergonomy over type safety

KodrAus · June 2, 2017, 9:54pm

Yeh I think usize is fine too. Using them for the traversal depth here just seems like an extra mental hurdle for someone to jump.

budziq · June 3, 2017, 11:29am

Also it is worth pointing out that https://docs.rs/walkdir/1.0.7/walkdir/struct.DirEntry.html has slightly different api than

https://doc.rust-lang.org/std/fs/struct.DirEntry.html

Especially the file_type for std::fs returns Result while Walkdir’s does not. That particular difference is also not mentioned in the docs.

Should there be an API guideline for types mimicking symbols from std?

burntsushi · June 3, 2017, 12:04pm

It is, but it could be clearer, as it's not particularly direct about it:

All recursive directory iterators must inspect the entry's type. Therefore, the value is stored and its access is guaranteed to be cheap and successful.

When you walk a directory recursively, you have to inspect the entry's type to determine whether it's a directory or not. Therefore, since that information is already available, walkdir can provide it to the caller as is without another stat call.

budziq · June 3, 2017, 1:13pm

Right, thanks! It’s not obvious on first glance.

KodrAus · June 13, 2017, 7:33pm

We’re going to kick off the review of the walkdir evaluation soon. I’m copying the the discussion points into the etherpad as the agenda

brson · June 13, 2017, 9:02pm

This was a really productive review. Excited to see the deluge of issues it creates. Thanks @KodrAus.

Topic		Replies	Views
Crate evaluation for 2017-07-25: rand	85	10918	March 25, 2019
Crate evaluation for 2017-06-27: error-chain libs	35	7289	March 25, 2019
Crate evaluation for 2017-05-16: log libs	52	13246	March 25, 2019
Out of band crate evaluation for 2017-07-06: env_logger libs	19	4222	March 25, 2019
Out of band crate evaluation for 2017-07-06: threadpool libs	20	4944	March 25, 2019

Crate evaluation for 2017-06-13: walkdir

Crate evaluation for 2017-06-13: walkdir

Links

Needs your help!

Guidelines checklist

Cookbook use case statements

API guideline updates

Discussion topics

Already stable

Follow-up

Links to types in std

Follow-up

Custom iterator adapters

Are there expectations on what an Iterator adapter should look like?

Could we solve this with a new name/docs/examples:

Follow-up

Should WalkDirIterator be externally implementable?

Follow-up

Ensuring Send/Sync

Follow-up

Error conditions for iterators returning results

Follow-up

Verified unwrapping

Follow-up

Re-exporting unstable is_same_file

Follow-up

Sorting entries

Follow-up

Platform-specific methods

Follow-up

Crate issues

Evaluation task assignments

How are we tracking?

Pre-review checklist

Post-review checklist

Related topics

Links to types in `std`

Should `WalkDirIterator` be externally implementable?

Re-exporting unstable `is_same_file`