Pre-RFC: Guarantee that pointers to `OsStr` has the same layout as pointers to `str`/`[u8]`

Rust makes vanishingly few guarantees about the layout of pointers to unsized types. To my knowledge, the only exception is that pointers to str have the same layout as pointers to [u8]. I propose extending this guarantee to OsStr as well as Path. OsStr's three implementations and Path all currently support this as a consequence of being repr(transparent) newtypes of either str or [u8], but there are comments (not doc comments!) above the definitions of OsStr and Path stating that this "is considered an implementation detail and must not be relied upon".

The primary use case for this guarantee is casting/transmutation: it is easy enough to convert between &str or &[u8] and OsStr or Path, but the same cannot be said for conversions between &[&str] or &[&[u8]] and &[&OsStr] or &[&Path]. Given that OsStr and Path are by all accounts intended to be newtypes that model strings with constraints somewhere between those of &str and &[u8], I think it makes sense to provide layout guarantees in turn.

1 Like

I don't think this needs an RFC, it should be an ACP. That being said I'm not sure the libs-api team will want it.

2 Likes

May I ask why not? I discussed the proposal with some folks in the RPLC Discord server, and the only plausible issue that came up was if a future implementation wanted to store locale information in the pointer metadata, but given that every str must be an OsStr, I'm not convinced that this would be a worthwhile endeavor.

Not advocating for this, but hypothetically, the first 1-4 bytes of an OsStr-with-locale-info could be invalid UTF-8 to signify the presence of locale information, and valid UTF-8 to signify that no locale information was provided (supporting str: AsRef<OsStr>).

Maybe there would be some advantage of allowing the locale info to contain a pointer (precluding the use of [u8]). Or the locale info could just be encoded into a fancy [u8].

Edit: wait, that's about the backing data, not the pointer to it.

:ferris_clueless:

I suppose that storing locale info in the backing data would have the disadvantage of requiring a &mut OsStr instead of a &mut &OsStr to change the locale.

I could maybe imagine storing a bit or two that says if it's known to be valid UTF-8 or not (allowing to skip checks in some cases). I don't think std will ever do anything with locales, at least that seems to be against the current philosophy.

I had a PR up for specifying that transmute was valid between &OsStr and &[u8] and libs-api, at the time, decided to instead make this done through explicit functions. See Allow limited access to `OsStr` bytes by epage · Pull Request #109698 · rust-lang/rust · GitHub

3 Likes

Are the meeting notes available anywhere? Your linked comment doesn't really get into the "why" of it all. My end goal was to create a path! macro that could be used to build truly cross-platform paths (i.e., that don't on / being a path separator) at comptime, which is unfortunately impossible as of now due to the absence of this guaranteee.