I think you are mixing up what "encoding" means. UTF-8, WTF-8 and UTF-16 are all encodings. UTF-8 and UTF-16 are ways of representing Unicode code points in a raw byte representation. UTF-8 uses between 1 and 4 bytes for each codepoint. UTF-16 uses either 2 bytes or 4 bytes for each codepoint.
UTF-8 cannot represent all file paths on UNIX because file paths are an arbitrary sequence of bytes (without interior NUL bytes). Similarly, UTF-16 cannot represent all file paths on Windows because file paths are an arbitrary sequence of 16-bit integers.
WTF-8 was invented to bridge the gap such that file paths on Windows can be non-lossily roundtripped between Rust's
PathBuf types. WTF-8 was chosen so that Rust
String types could be used in a zero cost fashion to manipulate file paths. (e.g.,
PathBuf is used on Windows to interact with the operating system, it is converted to an
OsString and then transcoded from its internal WTF-8 encoding (a strict superset of UTF-8) to a sequence of 16-bit integers that Windows expects.
When you have a
PathBuf or an
OsString in Rust in memory, its internal representation in memory uses WTF-8. When you actually go to use it for anything outside of Rust, it is first transcoded to 16-bit integers. Since WTF-8 is a strict superset of UTF-8 and can otherwise encode all possible sequences of 16-bit integers, it follows that 1) OsStrings can be manipulated in a zero cost fashion with Rust's guaranteed valid UTF-8 encoded
String types and that 2) all possible file paths on Windows can be correctly represented and roundtripped by Rust. The cost is that, at least on Windows, all file paths must be transcoded to and from WTF-8 and its sequence of arbitrary 16-bit integers.
If a file path on Windows contains valid UTF-16, then its corresponding WTF-8 representation is guaranteed to be valid UTF-8. The WTF-8 representation is only invalid UTF-8 in precisely the case where the file path on Windows is not valid UTF-16. In which case, you get a WTF-8 encoded string that cannot be converted to UTF-8 without error or replacement or omission.