I've noticed that there is a direct conversion from char to u32 (i.e. impl From<char> for u32), since https://github.com/rust-lang/rust/pull/35755. However, there doesn't seem to be direct conversion from char to u64, which seems surprising given that in general we have conversion rules From<uNN> for uMM as long as NN <= MM (as well as direct conversions from bool to uNN).
Is char playing a special role here, or could impl From<char> for u64 be added as well?
error[E0277]: the trait bound `u64: From<char>` is not satisfied
--> src/tui.rs:185:42
|
185 | ncurses::waddch(self.window, ' '.into());
| ^^^^^^^^^^ the trait `From<char>` is not implemented for `u64`
|
= help: the following implementations were found:
<u64 as From<NonZeroU64>>
<u64 as From<bool>>
<u64 as From<u16>>
<u64 as From<u32>>
<u64 as From<u8>>
= note: required because of the requirements on the impl of `Into<u64>` for `char`
It looks like ncurses::chtype is "A character, attributes and a colour-pair", so it's bigger to have some extra bits of information. It is technically a different type. Maybe the ncurses crate could add appropriate newtypes and conversions?
I agree the specific issue raised here might be better addressed in the crate with the more specific type info, but it still seems reasonable to me to impl From<char> for u64. Am I missing something?
This is just an oversight; a PR would be welcome to fix this shortcoming.
(It's very unfortunate that if someone makes the mistake of posting here with their small std oversight instead of making a PR, someone will almost certainly write a post discouraging them from contributing as if we have a divine plan to have exactly the API std has today. Please stop making posts discouraging contribution unless you have a really good reason to be confident the contribution would be rejected.)
chtype is an u32 or u64 depending on a feature flag. This is in violation with the principle that feature flags are strictly additive, as changing u32 into u64 is a breaking change.
FWIW, that has never surprised me because I don't think of char as being a "numeric" type. That said From conversion is a "newtype-unwrapping" conversion to me, not an "integer-widening" conversion. (char is essentially a struct char(u32); newtype, with some extra magic about having a more restricted value range.)
Notably there also isn't char: TryFrom<u16> even though there's char: TryFrom<u32>, and nor is there char: Into<f32> even though a 25-bit USV can also be exactly represented as a single-precision floating point type.
(But as boats said, one can always send a PR and see what happens.)
Is that not a very different thing from merely widening? A u64 is a superset of a u32. It's not that a u32 just so happens to be representable as a u64. A value of u32::MAX has the exact same meaning as u32::MAX as u64.
An u32 into f32 changes that meaning, even if it's exactly representable.
Yet we have From<u8> for char and TryFrom<u32>. Why not TryFrom<u16> too? Is that not inconsistent? I guess I don't understand the reasoning here.
I think this is the core of where people can have different perspectives here, since they'll have different definitions for when something "changes its meaning".
An f64 is a superset of a u32 (or an i32) just as "a u64 is a superset of a u32". Does that change its meaning? I don't know, and reasonable people can probably disagree on it. Someone used to javascript, where an f64 often serves as an i54, would probably say no.
Sure but if we're saying a char is an integer (which we are, no?) then I don't understand why an integer of 1 or 4 bytes is the only acceptable representation? Why not 2 or 8 as well? What's the distinction we're making here?
I think for want of a real numeric hierarchy in rust (maybe someday, maybe never) then every meaningful error-free widening operation should probably have a From impl, and missing ones just haven't been hit by someone who was willing to make a std PR. At some point the type system may evolve enough that these impls can be somehow automatic.
Even on LE systems you still have to change the alignment and pad it with zeros; I would have to disagree here. This operation can be done in one instruction (usually), but so can conversion to f64.
That's the point under debate. I think both of these are consistent and reasonable viewpoints:
char is an integer, so it should have conversions to other numeric things, including floats and bigintegers and such, because it's better to have that once in the library than everyone needing to figure them out themselves. Making people call an extra conversion method is annoying and we should just have all the transitive impls -- even if they're only situationally useful -- since it's From and thus not lossy.
char is text encoding, so should only have conversions that are needed in that context, so to/from u32 for unicode codepoints, and u8 for ascii (as that type has methods like u8::to_ascii_lowercase). The library should guide people to handling text correctly, and it's not a big deal for code doing something unusual to just do another conversion, certainly better than having a ton of extra From implementations in the docs that people would have to scroll past to get to the one they should actually be using.
They're opposed, but I don't think either one is wrong.
The thing I do feel strongly about is that it would be wrong to add justu64: From<char>. If that one's reasonable, then at least u128: From<char> is also reasonable. And if those are reasonable, I think it's also clear that i32: From<char> and similar are just as reasonable as well.
And for what it's worth the docs only say that a char is a Unicode scalar value, not that it's a UTF-32 text encoding (or any other encoding). Of course this isn't canonical but the fact it's documented as a Unicode value rather than a specific encoding strikes me as an important distinction.
This isn't like str, which is a specific encoding of Unicode text.
And yes, I don't see anything special about u64 in particular. Although as toc mentioned, the lack of a defined real numeric hierarchy in Rust may make conversions to signed integer arguable even if u64 was accepted.
I'd argue that char is not a number type, because it doesn't implement any arithmetic operations (in contrast to Java, where a char is just a two-byte unsigned integer).
char isn't in a text encoding, it's in a number encoding (big-endian or little-endian depending on the platform). Of course this is an implementation detail, but it's highly unlikely to ever change.
I think it makes sense to implement conversions between char and u16/u64/u128, not because it's the correct thing to do, but because it might prove useful, and I can't see any downsides.
It may be a fairly trivial encoding method, but it's certainly a text encoding. UTF-32 is a Unicode Encoding Form in which Unicode scalar values are encoded as 32-bit numbers, with big-endian and little-endian variations. The [char] type in Rust corresponds precisely to UTF-32BE or UTF-32LE, depending on platform.
What's the rationale for char: TryFrom<u16>? Is it just a glorified version of char: TryFrom<u8>, where the former might work for most non-Chinese languages while the latter works only for slightly-extended ASCII? Philosophically, should Rust standardize support for these opinionated uses, which intrinsically cannot be language-agnostic?
FWIW, I actually have neither viewpoint. I thought From/Into were simply for infallible conversions where there's only one possible (or one obvious default) way of doing the conversion, such that no one could reasonably need to ask why this char value gets turned into that u32 value instead of some other u32 value. On that view, whether a char "is an integer" is simply irrelevant (or at least not directly relevant) to the question of whether these conversions should exist.
But I completely agree with this. In general, I think the only reason we shouldn't simply add every unambiguous From/Into impl we can is to avoid creating overlapping impls that no one can actually use (without UFCS), but these all seem pretty safe.
Admittedly, I'm not familiar enough with Unicode to be 100% confident that every char value fits in the positive i32s, but I think they do, and if they do then that impl's clearly fine.