I posted here instead of GitHub for two reasons. Firstly to verify this is actually a bug and not me using it incorrectly. Secondly I don’t actually know where the Github is for this
Thanks, I referenced to here, but not back since this was intended to be the main post.
There’s some contention as to whether this is a bug, but the core of the issue is there doesn’t seem to be any way in rust to call
some.exe "a multi-word argument"
It will only send as
some.exe \"a multi-word argument\"
Which creates problems with programs that use a different escaping scheme. This problem is hidden (more likely worked around) in rust to rust programs since the argument digester for rust invisibly removes the \". Unfortunately many older cli programs in windows aren’t that smart and we need to be able to send raw unescaped quotes as arguments when calling other programs
But it only happens in windows. It may be due to issues caused by the CommandLineToArgvW function of the windows API
You’re right that Rust doesn’t support commands with quoting scheme other than CommandLineToArgvW, but the example you’ve given is incorrect. Command::new("some.exe").arg("a multi-word argument") runs some.exe "a multi-word argument". In fact, Command is incapable of producing your second example. It can do
some.exe "\"a multi-word argument\""
the outer " is a crucial difference, as there is a different syntax between and inside arguments, and unescaped " switches between modes.
that takes the whole command line except the command name itself. So:
Command::new("foo").arguments_escaped("\\ args \" bar \" \"\"\" baz")
would execute command + space + arguments_escaped value verbatim, so GetCommandLineW in the executed command would be:
foo \ args " bar " """ baz
I've named it _escaped, meaning already escaped, because the caller is responsible for escaping arguments using some command-specific syntax (the name could be raw_args or anything else ). The syntax is arbitrary as interpreted by the command being executed, so the Rust stdlib can't possibly know the syntax for all commands. User of Command would have to take that responsibility.
It's not exactly as bad as the Unix equivalent of passing a whole shell command as a string, because by default there's no dangerous shell syntax interpreted. It could be made dangerous if used as Command::new("cmd").arguments_escaped(format!("/c command {}", args)), but then it's equivalent of Command::new("bash").arg("-c").arg(format!("command {}", args)), which is a bad idea as well.
Linux, and probably other platforms, don't even have a concept of passing raw unparsed command string from the shell, so this method couldn't be portable. It could be OK if it was an extension trait in Windows-specific corner of the stdlib.
Alternatives:
Add .raw_arg(str)/.escaped_arg(str) that just appends a string to the command line (perhaps delimited by spaces). The upside is that it looks more like regular usage of Command, but the downside is that .raw_arg(one).raw_arg(two) could set expectation of passing two arguments, but the actual meaning of it is impossible to define.
Add .arg_with_syntax(str, ArgSyntax::DoesNotSupportNestedQuotes), .arg_with_syntax(str, ArgSyntax::ThisIsTheLastArgAndEverythingFromHereIsVerbatim) with quoting/sanitisation/mangling options for variously broken ad-hoc parsers.
Recognize the command being executed (e.g. dir, notepad, cmd /c, echo), and choose a different argument serialization syntax appropriate for whatever ad-hoc argument parser that command uses. The upside is that .arg() would magically work as intended, the downside is endless whack-a-mole with unlimited set of broken parsers. Also recognizing commands is unreliable (e.g. if the executable gets renamed).
Agreed, it's unfortunate so many windows legacy cli programs have arbitrary argument parsing with different implementations but to be able to live peacefully with them we need some way to send very precise unescaped strings to them.
I do think the current implementation probably still makes sense as a default since it simplifies the process for those who aren't deeply versed in the (sometimes bizarre) world that is microsoft systems programming, but we do need a way for me as the programmer to say "I accept the responsibility of understanding how the argument parsing for the program I'm calling works, and I want to send a very precise string exactly as I need"
That could be added to CommandExt. You'd have to figure out the semantics for mixing this with plain Command::arg and Command::args, but if you can come up with a good design, it would be a nice feature.
Can std::os::windows::process::CommandExt actually be extended? I don’t see anything to keep me from implementing it for a custom struct (on Windows only) in a completely-useless-but-would-be-broken manner.
@kornel I thought that the usual way of escaping quotes in Windows is to use two quotes? At least this is what I usually use and what is supported by most commands.
The syntax of it is bizarre. It seems to work for one consecutive quote only by accident. The syntax tries to be everything for everyone, it gets progressively weirder around edge cases. If you’re expecting straightforward quote doubling to escape safely, you may be surprised:
it might also be possible to pass them through environment variables, if we’re using something that will expand them when calling to a process
Here’s a quick example of it working within cmd.exe, if it’s running through a windows api there’s a chance whatever mechanism rust is using to execute files may support the same
rem set /p test=type something:
set test="c:\temp\"quote test".txt"
notepad %test%
cmd /c start "" notepad.exe %test%
Notes:
“rem” is cmd’s comment indicator,
cmd /c start "" xyz abc might be a way of calling this if we can’t do it through the API but that would be super janky
you can check it’s actually passing the quotes, not just the %test% by using Process Explorer from microsoft (link in one of the posts above)
Here’s a test example using a cmd /c + env workaround. I leave it here for anyone who needs a temp workaround until we work out a long term answer
NOTE: I have reworked this to use powershell because cmd complains about network paths. This was more difficult that I had anticipated so I’ve put the updated version here for people who aren’t as versed with powershell’s eccentricities
//mod windows_runner;
fn main() {
let quote_test: &str = r#" one" two"" three""" four"""" five""""" "#;
{ // to show you can collect stdout
let stdout_test: String = windows_runner::run("write-host", quote_test, "");
println!("{}",stdout_test);
}
{ // to check it passes stdin correctly, also shows you can call without argument
let stdin_test: String = windows_runner::run("nslookup",r#""#,"google.com\nexit\n");
println!("{}",stdin_test);
}
{ // to check (with procexp) that the arguments actually pass exactly as given (including without surrounding quotes), though it does add one extra space between program and arguments
windows_runner::run("notepad", quote_test, "");
}
}
mod windows_runner{
use std::{thread, time, str};
use std::process::{Command, Stdio};
use std::io::Write;
pub fn run (program:&str,arguments:&str,stdin:&str) -> String /*(String,String)*/ {
let launcher = "powershell.exe";
let build_string: String;
{
if arguments.trim() == "" { // no arguments (powershell gets confused if you try to execute a program with an empty array as the argument set)
build_string = format!(r#"& '{}'"#,program);
}
else {
let mut arguments_reformatting: Vec<&str> = Vec::new();
for argument in arguments.split(" ") {
arguments_reformatting.push(argument);
}
let arguments_reformatted = arguments_reformatting.join("','");
build_string = format!(r#"& '{}' @('{}')"#,program,arguments_reformatted); // powershell digests: & 'pro gram' @('argument1','argument2') => "pro gram" argument1 argument2
}
}
let launch_command: &[String] = &[build_string];
let mut child = Command::new(launcher)
.args(launch_command)
.stdout(Stdio::piped())
.stdin(Stdio::piped()) // disable this if you want the user to be able to speak with the child instead of doing it yourself
/*.stderr(Stdio::piped())*/ // if you want to collect stderr instead of displaying to user
.spawn()
.expect("failed to run child program");
{ // send stdin, disable this if you want the user to be able to speak with the child instead of doing it yourself
let stdin_handle = child.stdin.as_mut().expect("Failed to get stdin");
stdin_handle.write_all(stdin.as_bytes()).expect("Failed to write to stdin");
}
// would you kindly wait for the child to finish
let check_every = time::Duration::from_millis(10);
loop {
match child.try_wait() {
Ok(Some(_status)) => {break;}, // finished running
Ok(None) => {} // still running
Err(e) => {panic!("error attempting to wait: {}", e)},
}
thread::sleep(check_every);
}
let output = child
.wait_with_output()
.expect("failed to wait on child");
let stdout: String = String::from_utf8_lossy(&output.stdout).to_string();
/*{ // if you want to collect stderr instead of displaying to user
let stderr: String = String::from_utf8_lossy(&output.stderr).to_string();
(stdout,stderr)
}*/
stdout
}
}
I'm pretty sure the current policy is that they can be arbitrarily extended and that you're never supposed to implement them on your own types. They're only a temporary crutch until we have a better platform specific lint system.