Optimize `std::env::args_os().collect::<Vec<_>>()`

Sometimes I want to cache the command line arguments,

so I would write args_os().collect::<Vec<_>>().

But recently I found that .collect() causes further allocation,

while immediate converting vec::IntoIter to Vec requires no allocation.

If ArgsOs is no more than vec::IntoIter, converting ArgsOs to Vec should be zero-cost.

But a wrapper of vec::IntoIter just can't trigger SpecFromIter<T, vec::IntoIter<T>>.

Can we add some support to eliminate that overhead?

For example, add a method ArgsOs::into_inner to allow args_os().into_inner().collect::<Vec<_>> to be zero-cost,

or add something like ArgsOs::into_vec, etc.

1 Like

I think the implementation of env::args_os() on most OSes could be made to not allocate a Vec at all and instead convert the OS argument representation to OsString elements on the fly.

2 Likes

Though that is potentially problematic because argv can be modified during runtime (some argument parsers do this). Though admittedly this is technically also a problem at the minute but doing it all at once at least makes it less likely.

Note that this isn't true in this case, but good luck figuring that out without a debugger:

Also, I know Rust makes it easy to pay attention and see where all your allocations are, but 90% of programs will have less than 40 command line arguments, which means your entire Vec (as opposed to the strings inside it) will be smaller than 1KiB, a size that’s about 1% of a 3.5" floppy disk from the 80s and won’t even show up in your used memory stats on an operating system from the last ten years. As long as you’re not doing this over and over, you really don’t need to worry about the memory or CPU cost of one extra allocation.

3 Likes

There would (edit: hypothetically!) be value in a non-allocating version if it allowed functionality to be moved into core. a_vec.into_iter().collect() being cheap is a pattern that perhaps could be better highlighted, but the status quo is also pretty reasonable from my perspective.

env::args_os() cannot be in core because it retrieves information provided by the operating system. core is the subset of the standard library that does not assume any particular operating system (nor does it assume the lack of one).

3 Likes

The datastructures depended upon by OS related operations do not necessarily need to live outside of core, and can be useful outside of it to represent analogous datastructures without an operating system. Admittedly I don't think there's much demand for it.