Glob expansion in std?


#1

Rust’s stdlib includes a cross-platform abstraction for CLI arguments (env::args), but there’s a catch: Unix shells perform glob expansion, but Windows shell doesn’t (so program file.* instead of selecting files with any extension just passes one arg literally with "*" in it).

Even though technically the glob expansion is done on a completely different level outside of the program, tools written for Unix rely on that and assume that’s always available. And I assumed that, and my program failed to support globs on Windows.

Can this be improved? Should Rust’s stdlib emulate glob expansion on Windows? (e.g. MinGW does that!)


edit: Solved: https://crates.rs/crates/wild


#2

I think this should, at the very least, begin life as a crate. Indeed, this exact bug has been reported to ripgrep: https://github.com/BurntSushi/ripgrep/issues/234

Note that there are two versions of a solution here. One is to use the standard Windows API for resolving globs, while the other is to use one of our own globbing libraries (e.g., glob or globset). The former might be consistent with what behavior users expect, but AIUI, the Windows support for globs is not quite the same as the standard Unix globbing.


#3

There are 2 situations

  1. Accepting glob args in an executable (I think this is the OP). For this I think it’s up to the user to perform expansion. There are shells in Windows that will do this. Maybe you could print a warning if you see a ? or * in the path, but iirc these are valid paths on at least Linux.

  2. If you are running a program as a subprocess, I think it’s up to you to do the globbing. This saves any confusion.


#4

There’s /link setargv.obj option in MSVC. Could Rust set that?


#5

PowerShell does this already, and it’s much more capable and user friendly than cmd. Wordy, sure, but a much better shell experience.

This is just my own personal opinion, but I really think languages should respect the default platform behaviors as much as possible. Per application, sure, it’s up to you. Even if you did set that though, it probably won’t work with the GNU tool chain so Rust would have to maintain a copy for that, and Windows paths are complicated so mimicking something from MS might be tricky.


#6

but the default behavior on Windows is to have globs (e.g. dir *.* works [edit: dir *, too]). A command that accepts files as arguments, but doesn’t take globs, is broken from user perspective, even on Windows.

glob characters are not allowed in Windows filenames, so double expansion is not a problem, so AFAIK it wouldn’t affect powershell (i.e. powershell or any other shell can have super fancy expansion however it wants and pass it to a command, and the result will be exactly the same regardless whether the command has its own basic expansion or not).


#7

That’s cmd behavior. For powershell you only need one *. I agree that in this case it’s pretty harmless - NT paths (\\?\...) can contain *s, but as long as you only apply it to Win32 paths there’s no problem with *. But they can have []s - how do you escape those? PowerShell I think does that expansion too, and you escape with the backtick, so I’d have to know if the program was written in Rust I have to double escape them, which breaks shell tab expansion for those files. Also what if I’m using bash or another POSIX shell on Windows 10? And are you giving Rust cmd, PowerShell, bash or some other style of expansion? Personally I think the best solution is just to not use cmd as your shell.


#8

No, you can only assume you can’t possibly know. The problem is not unique to Rust. You’d also need to know whether a program was compiled with MinGW, or compiled with MSVC with setargv on, or is a Go program using path.Glob, etc.

So Rust can’t make it any worse for users of other shells — this problem already exists for all Windows programs. But Rust can at least make it less bad for users of the built-in cmd though.

That’s not for me to choose (IMHO the best solution is not to use Windows at all). But I get bug reports from users that my program worked as expected in MinGW-compiled version, and the Rust does not.


#9

I’d like to point out that it’s impossible for Rust programs to handle globs correctly themselves via env::args(). That’s because the args are already split and unquoted in the form exposed by Rust’s stdlib, and the difference between * and "*" is lost.

So even if we agree that glob expansion does not belong to stdlib and is each app’s own problem, there still has to be something in the stdlib to expose unparsed raw arguments to allow apps to implement globs themselves.


#10

You can trivially get the raw unparsed command line on Windows though. Just call GetCommandLineW and then you can parse it and do whatever you want with it. Unless you want something cross platform, but since this is specifically a problem you have with cmd.exe, which is exclusive to Windows, platform specific code isn’t the worst thing in this case.


#11

I’ve implemented this as a crate:

https://crates.io/crates/wild


#12

Wrapping env::args() seems like a bad idea to me, as it will affect even arguments that aren’t supposed to be interpreted as filepaths. Unix users might be used to this, but are windows users?

It also seems like a breeding ground for double-expansion issues, if you were writing a program that takes another as a command (like time).


#13

From my unix experience it’s never a problem that non-file arg is interpreted as a path glob (only the other way around is common, but globs don’t affect that). You’d have to write an unquoted arg with glob characters and have a file in the current directory with an unusual arg-like name like --foo= that matches the glob. In theory it can happen, but in practice it doesn’t. There are so many much messed up things about cmd.exe that this isn’t even high on the list.

Glob metacharacters are forbidden in all filenames on Windows, so double expansion is impossible.