Crate evaluation for 2017-06-13: walkdir

Crate evaluation for 2017-06-13: walkdir

For additional contribution opportunities, see the main libz blitz thread.

This post is a wiki. Feel free to edit it.

Links

Needs your help!

Anything that is not checked off still needs your help! There is no need to sign up or ask for permission - follow any of these links and leave your thoughts:

Guidelines checklist

Legend

  • [y] = guideline is adhered to, no work needed.
  • [n] = guideline may need work, see comments nearby
  • [/] = guideline not applicable to this crate

Checklist

Guidelines checklist

This document is collaboratively editable. Pick a few of the guidelines, compare the walkdir crate against them, and fill in the checklist with [y] if the crate conforms to the guideline, [n] if the crate does not conform, and [/] if the guideline does not apply to this crate. Each guideline is explained in more detail here. If [n], please add a brief note on the following line with an explanation. For example:

  - [n] Crate name is a palindrome (C-PALINDROME)
        - walkdir backwards is ridklaw which is not the same as walkdir

Cookbook use case statements

Cookbook example ideas

Come up with ideas for nice introductory examples of using walkdir, possibly in combination with other crates, that would be good to show in the Rust Cookbook. Please leave a comment in that issue with your ideas! You don’t necessarily have to write the example code yourself but PRs are always welcome.

API guideline updates

API guideline ideas

What lessons can we learn from walkdir that will be broadly applicable to other crates? Please leave a comment in that issue with your ideas!

Discussion topics

Already stable

walkdir is already stable, so we should try find non-breaking solutions, but @burntsushi isn’t against a 2.0 release if there’s enough in it.

Follow-up

  • Publish 2.0

Links to types in std

There are a lot of references to types in std::fs. Should these be links to the relevant section in the std API docs? If so, where should they point?

Follow-up

Custom iterator adapters

WalkDirIterator::filter_entry is not obviously better than into_iter().filter_map.

filter_entry behaves differently to filter in a way that isn’t clear from the function signatures.

Are there expectations on what an Iterator adapter should look like?

  • Predicate input as Self::Item or &Self::Item: filter_entry takes &DirEntry instead of Result
  • Chainable with any other adapter: filter_entry is only available on WalkDirIter

Could we solve this with a new name/docs/examples:

  • Maybe rename to filter_walk? It might be helpful to think of it as a driver for the walk that always comes before filter, operating on an already determined set of entries
  • Example for use-cases that exploit the file structure (and may want to filter_entry)
  • Example for use-cases that ignore the file structure (and wouldn’t want to filter_entry)

Or make filter_entry a method on WalkDir instead of an adapter?

Follow-up

  • Make WalkDirIterator methods inherent on Iter and IterFilterEntry
  • Deprecate WalkDirIterator trait in favour of inherent methods
  • Remove WalkDirIterator trait in 2.0
  • Add doc example for filter_entry vs filter why choose one over the other

Should WalkDirIterator be externally implementable?

The WalkDirIterator is implemented by a few types internally, but doesn’t seem to have any value being implemented externally. Should it have a Sealed supertype to signal that it’s not for implementing?

Follow-up

  • Irrelevant since we’re removing the trait

Ensuring Send/Sync

Determining send/sync still hard. WalkDirOptions is not Send, which prevents WalkDir and Iter from being Send. Maybe we could create assert_send!, assert_sync! macros and apply them to every type.

Should we strengthen recommendation to assert types are send/sync by default?

Follow-up

  • Make sort_by function Send + Sync
  • Add assert_send / assert_sync macros and test:
    • WalkDir
    • Iter
    • IterFilterEntry
  • @KodrAus to open a Pre-RFC about asserting Send / Sync in std

Error conditions for iterators returning results

Where should Iter::next's Error conditions be documented if Iter::Item is a Result? On the impl of next? On the iterator itself?

Follow-up

  • Add Errors section to Iter and IterFilterEntry type docs
  • @KodrAus to file guideline issue about Error sections for trait methods returning errors
  • @KodrAus to file issue about rust-doc calling out overridden trait docs more

Verified unwrapping

walkdir::Iter.next() calls unwrap on a Vec that’s guaranteed not to panic by the block it’s in. Should this be expect with a comment describing why it can’t fail?

Could any of these be refactored to avoid the unwrap altogether? All methods calling unwrap that are validated:

  • <Iter as Iterator>::next
  • Iter::get_deferred_dir
  • Iter::push

Should there be a guideline for this?

Follow-up

  • Change unwrap to expect and document why they won’t panic
  • Open guidelines issue about unwrap vs expect

Re-exporting unstable is_same_file

Should the is_same_file function be re-exported from the unstable same_file crate? It isn’t mentioned in docs or used in examples. Looks like it was pulled into a new crate post 1.0 and re-exported to prevent a breaking change.

Follow-up

  • Deprecate re-export in favour of is_same_file crate
  • Remove re-export in 2.0

Sorting entries

Could sorter support more metadata to compare like size, last modified? It should at least use OsStr instead of &OsString.

Follow-up

  • Open issue to support sorting by more file metadata (might not be feasible)
  • At least change &OsString args to OsStr

Platform-specific methods

Raised during the meeting that DirEntry::ino is a unix only method, but that isn’t reflected in the docs.

Follow-up

  • Add unix::DirEntryExt trait and add ino method there
  • Remove inherent ino method from DirEntry
  • Document that DirEntryExt is unix only

Crate issues

[details=Summary]- Add Error documentation sections for

  • is_same_file
  • Iter / Iter::next - probably should be documented in both places? See discussion
  • IterFilterEntry / IterFilterEntry::next
  • DirEntry::metadata
  • docs: Indicate that DirEntry::ino is Unix only
  • docs: remove unwrap from
    • examples in root
    • examples on WalkDir
    • examples on WalkDirIterator.skip_current_dir and filter_entry
  • docs: clarify differences between walkdir::DirEntry and std::fs::DirEntry more
    • add examples for DirEntry methods
    • demonstrate differences with std::fs::DirEntry in method examples
  • docs: add example for contents_first
  • docs: add links for:
    • filter_entry in root skip hidden files and directories efficiently on unix example
    • path, file_name and follow_links in DirEntry
    • WalkDir::new in DirEntry::path
    • follow_links in DirEntry::path_is_symbolic
    • follow_links in DirEntry::metadata
    • follow_links in DirEntry::file_type
    • WalkDir in Iter
    • WalkDir::min_depth in IterFilterEntry
    • WalkDir::max_depth in IterFilterEntry
    • WalkDir::filter_entry in WalkDir
    • WalkDir::min_depth in WalkDir
    • WalkDirIterator::filter_entry in WalkDirIterator::skip_current_dir
    • WalkDirIterator::skip_current_dir in WalkDirIterator::filter_entry
    • WalkDir::min_depth in WalkDirIterator::filter_entry
    • WalkDir::max_depth in WalkDirIterator::filter_entry
  • docs: Add a note that IterFilterEntry is created by calling Iter::filter_entry
  • docs: Correct inaccuracies in WalkDir root docs: if contents_first is true then directories aren’t emitted before their contents. The order isn’t unspecified if a sort_by predicate is given
  • Add html_root_url attribute
  • Add Debug to Iter, IterFilterEntry, WalkDir
  • Add CI for OSX (?) and crate badges for tier 1 platforms
  • Add categories metadata to Cargo.toml
  • Add Debug impls for
    • Iter
    • IterFieldEntry
    • WalkDir
  • Rename Iter to IntoIter
  • Rename IterFilterEntry to FilterEntry[/details]

Evaluation task assignments

  • [ ] @KodrAus assert_send / assert_sync Pre-RFC
  • [ ] @KodrAus follow up stability of /stable/std path to std docs
  • [ ] @KodrAus follow up rustdoc calling out overridden trait method docs

How are we tracking?

Pre-review checklist

Do these before the libs team review.

  • [x] Create evaluation thread based on this template
  • [x] Work with author and issue tracker to identify known blockers
  • [x] Compare crate to guidelines by filling in checklist
  • [x] Record other questions and notes about crate
  • [x] Draft several use case statements to serve as cookbook examples
  • [x] Record recommendations for updated guidelines

Post-review checklist

Do these after the libs team review.

  • [x] Publish libs team meeting video
  • [x] Create new issues and tracking issue on crate’s issue tracker
  • [ ] Solicit use cases for cookbook examples related to the crate
  • [ ] File issues to implement cookbook examples
  • [ ] File issues to update guidelines
  • [ ] Post all approachable issues to TWiR call for participation thread
  • [ ] Update links in coordination thread
3 Likes

Thanks for leading this @KodrAus.

I added some cookbook suggestions, and evaluated a bit of the checklist, added some discussion points and documentation issues.

This is another crate where serde doesn’t seem particularly appropriate.

I continue to be unclear on how hard we should try to put examples on every method, like std does. This crate seems pretty well-exampled to me, and the methods are pretty clear about what they do. It doesn’t seem vital that every method here has an example. Should we file issues for each of them and try to get examples, or just let it be good enough as is?

1 Like

@KodrAus do you mind listing in the op potential issues to file against walkdir for the guidelines you’ve encountered that it doesn’t fulfill? I worry that it’s hard to translate after the fact the "n"s to issues.

Thanks @brson, @birkenfeld and @omh1280 for getting started on the evaluation!

@brson I also think it’s good enough as it is, it’s a special use-case crate with good examples on its entry-points.

A pain point @burntsushi raised with walkdir is the WalkDirOptions.sorter field, which is essentially a FnMut(&OsString,&OsString) -> Ordering + 'static. This currently prevents WalkDir and Iter from being Send, because that closure isn’t Send.

If we had an assert_send! macro this might’ve been easier to pick up, but since it was raised as an issue while thinking about rayon support post 1.0 I’m guessing it’s a great nice-to-have but hasn’t actively blocked walkdir users. But it’s a bit unfortunate that missing Send on one type cascades through all the types that use it and prevent them from being Send too.

Maybe a good API recommendation is for producers to require Send on their generic closures, unless they’ve got a reason not to.

EDIT: Reflecting on this a bit more I think the issue is that concrete structures are all Send until they’re not (you have a non-Send field), but traits/generic types are not Send until they are (you add a Send bound). And you as the API producer might not be the one that needs to Send your types. I’m sure it’s come up before, but interesting to think about its implications on API design :slight_smile:

walkdir can report files in a sorted order.

WalkDir::new("foo").sort_by(|a,b| a.cmp(b));

This is convenient when you want to compare directories. A nice cookbook entry could be to print a diff between two directories.

1 Like

came here via reddit call for examples - I've got a simple one:

2 Likes

I tried to synthesize some of the cookbook recipe ideas here, adding one new one to the OP, but mostly asking questions about how to show off specific features of walkdir.

1 Like

@burntsushi Does walkdir solve any tricky cross-platform problems that would make a good example, maybe with symlinks or something?

The one I can think of is that if you enable walkdir to follow symlinks, then it will detect loops and return an error if it finds one. (This is a cross platform thing because of symlinks (which is handled by std), but also for determining whether two file paths refer to the same file.)

@brson It looks like there are a few differences with filter_entry vs Iterator.filter_map/Iterator.filter in behaviour and input. I think the discussion point is still valid to investigate if we can make the iterator implementations behave the same way and if that’s worthwhile.

x-posted from https://github.com/brson/rust-cookbook/issues/112 Please note that filter_entry changes the semantics over filter which might be surprising and unwanted by the user (we might want to filter over files but descend to all dirs or filter dirs with separate predicate). These two would coexist. But I kinda find the filter_entry name braking the rule of least surprise but I’m unable to suggest a better name.

@budziq That's a valid point. I think we should reach for filter_entry over filter / filter_map unless we don't want to skip directories, but the case of .filter_map(Result::ok) seems common.

One of the quirks of filter_entry is that it has to be the first adapter. I think this is a good thing, and I think the docs could try tease apart filter / filter_map vs filter_entry a bit more. I'll make a note of this in the cookbook thread too.

@budziq I’ve added a discussion point about filter_entry behaving in a way that isn’t obvious from its name.

The point about filter being better for matching files and still traversing directories is a good one, but I’m still inclined to prefer filter_entry and exclude directories from the predicate. That’s clearer to anyone reading the code that it’s only acting over files. This case could be made more ergonomic.

I’m not really sure about [C-NEWTYPE] and [C-CUSTOM-TYPE] here with depth parameters as usize but it might be another trade-of for ergonomy over type safety

Yeh I think usize is fine too. Using them for the traversal depth here just seems like an extra mental hurdle for someone to jump.

Also it is worth pointing out that https://docs.rs/walkdir/1.0.7/walkdir/struct.DirEntry.html has slightly different api than

https://doc.rust-lang.org/std/fs/struct.DirEntry.html

Especially the file_type for std::fs returns Result while Walkdir’s does not. That particular difference is also not mentioned in the docs.

Should there be an API guideline for types mimicking symbols from std?

1 Like

It is, but it could be clearer, as it's not particularly direct about it:

All recursive directory iterators must inspect the entry's type. Therefore, the value is stored and its access is guaranteed to be cheap and successful.

When you walk a directory recursively, you have to inspect the entry's type to determine whether it's a directory or not. Therefore, since that information is already available, walkdir can provide it to the caller as is without another stat call.

1 Like

Right, thanks! It’s not obvious on first glance.

We’re going to kick off the review of the walkdir evaluation soon. I’m copying the the discussion points into the etherpad as the agenda

1 Like

This was a really productive review. Excited to see the deluge of issues it creates. Thanks @KodrAus.

1 Like