Implement From<char> for u64

gendx · November 26, 2020, 8:04am

I've noticed that there is a direct conversion from char to u32 (i.e. impl From<char> for u32), since https://github.com/rust-lang/rust/pull/35755. However, there doesn't seem to be direct conversion from char to u64, which seems surprising given that in general we have conversion rules From<uNN> for uMM as long as NN <= MM (as well as direct conversions from bool to uNN).

Is char playing a special role here, or could impl From<char> for u64 be added as well?

This is motivated by the following code: https://github.com/gendx/connect-box/commit/16d2bb41610641a80abdb338c020c6e0d1580a50.

See in particular https://travis-ci.org/github/gendx/connect-box/jobs/745979112.

error[E0277]: the trait bound `u64: From<char>` is not satisfied
   --> src/tui.rs:185:42
    |
185 |             ncurses::waddch(self.window, ' '.into());
    |                                          ^^^^^^^^^^ the trait `From<char>` is not implemented for `u64`
    |
    = help: the following implementations were found:
              <u64 as From<NonZeroU64>>
              <u64 as From<bool>>
              <u64 as From<u16>>
              <u64 as From<u32>>
              <u64 as From<u8>>
    = note: required because of the requirements on the impl of `Into<u64>` for `char`

kornel · November 26, 2020, 2:46pm

It looks like ncurses::chtype is "A character, attributes and a colour-pair", so it's bigger to have some extra bits of information. It is technically a different type. Maybe the ncurses crate could add appropriate newtypes and conversions?

anp · November 26, 2020, 3:45pm

I agree the specific issue raised here might be better addressed in the crate with the more specific type info, but it still seems reasonable to me to impl From<char> for u64. Am I missing something?

cuviper · November 26, 2020, 5:25pm

Crates can't implement foreign traits for foreign types, so that would have to use a newtype.

withoutboats · November 26, 2020, 5:56pm

This is just an oversight; a PR would be welcome to fix this shortcoming.

(It's very unfortunate that if someone makes the mistake of posting here with their small std oversight instead of making a PR, someone will almost certainly write a post discouraging them from contributing as if we have a divine plan to have exactly the API std has today. Please stop making posts discouraging contribution unless you have a really good reason to be confident the contribution would be rejected.)

bjorn3 · November 26, 2020, 6:08pm

chtype is an u32 or u64 depending on a feature flag. This is in violation with the principle that feature flags are strictly additive, as changing u32 into u64 is a breaking change.

github.com

jeaye/ncurses-rs/blob/328502a100cd71fcea875c686ceed15931b048fd/src/ll.rs#L19-L22


#[cfg(feature="wide_chtype")]
pub type chtype = u64;
#[cfg(not(feature="wide_chtype"))]
pub type chtype = u32;

scottmcm · November 27, 2020, 1:40am

FWIW, that has never surprised me because I don't think of char as being a "numeric" type. That said From conversion is a "newtype-unwrapping" conversion to me, not an "integer-widening" conversion. (char is essentially a struct char(u32); newtype, with some extra magic about having a more restricted value range.)

Notably there also isn't char: TryFrom<u16> even though there's char: TryFrom<u32>, and nor is there char: Into<f32> even though a 25-bit USV can also be exactly represented as a single-precision floating point type.

(But as boats said, one can always send a PR and see what happens.)

chrisd · November 27, 2020, 3:04am

Is that not a very different thing from merely widening? A u64 is a superset of a u32. It's not that a u32 just so happens to be representable as a u64. A value of u32::MAX has the exact same meaning as u32::MAX as u64.

An u32 into f32 changes that meaning, even if it's exactly representable.

Yet we have From<u8> for char and TryFrom<u32>. Why not TryFrom<u16> too? Is that not inconsistent? I guess I don't understand the reasoning here.

scottmcm · November 27, 2020, 4:33am

I think this is the core of where people can have different perspectives here, since they'll have different definitions for when something "changes its meaning".

An f64 is a superset of a u32 (or an i32) just as "a u64 is a superset of a u32". Does that change its meaning? I don't know, and reasonable people can probably disagree on it. Someone used to javascript, where an f64 often serves as an i54, would probably say no.

chrisd · November 27, 2020, 4:36am

Sure but if we're saying a char is an integer (which we are, no?) then I don't understand why an integer of 1 or 4 bytes is the only acceptable representation? Why not 2 or 8 as well? What's the distinction we're making here?

toc · November 27, 2020, 4:41am

I think for want of a real numeric hierarchy in rust (maybe someday, maybe never) then every meaningful error-free widening operation should probably have a From impl, and missing ones just haven't been hit by someone who was willing to make a std PR. At some point the type system may evolve enough that these impls can be somehow automatic.

Even on LE systems you still have to change the alignment and pad it with zeros; I would have to disagree here. This operation can be done in one instruction (usually), but so can conversion to f64.

scottmcm · November 27, 2020, 6:03am

That's the point under debate. I think both of these are consistent and reasonable viewpoints:

char is an integer, so it should have conversions to other numeric things, including floats and bigintegers and such, because it's better to have that once in the library than everyone needing to figure them out themselves. Making people call an extra conversion method is annoying and we should just have all the transitive impls -- even if they're only situationally useful -- since it's From and thus not lossy.
char is text encoding, so should only have conversions that are needed in that context, so to/from u32 for unicode codepoints, and u8 for ascii (as that type has methods like u8::to_ascii_lowercase). The library should guide people to handling text correctly, and it's not a big deal for code doing something unusual to just do another conversion, certainly better than having a ton of extra From implementations in the docs that people would have to scroll past to get to the one they should actually be using.

They're opposed, but I don't think either one is wrong.

The thing I do feel strongly about is that it would be wrong to add just u64: From<char>. If that one's reasonable, then at least u128: From<char> is also reasonable. And if those are reasonable, I think it's also clear that i32: From<char> and similar are just as reasonable as well.

chrisd · November 27, 2020, 6:39am

Well, I mean we already do for u8:

fn main() {
    // Note: `0xff` is not ascii.
    let c: char = 0xff_u8.into();
}

Playground Link

And for what it's worth the docs only say that a char is a Unicode scalar value, not that it's a UTF-32 text encoding (or any other encoding). Of course this isn't canonical but the fact it's documented as a Unicode value rather than a specific encoding strikes me as an important distinction.

This isn't like str, which is a specific encoding of Unicode text.

And yes, I don't see anything special about u64 in particular. Although as toc mentioned, the lack of a defined real numeric hierarchy in Rust may make conversions to signed integer arguable even if u64 was accepted.

kornel · November 27, 2020, 10:22am

Oh, I wasn't aware that From<u8> has also been added despite the earlier PR warning that it may not be the right thing to do from encoding perspective.

So in that case the precedent of "chars are just numbers" has been set, and u64 conversion would fit the existing ones.

tspiteri · November 27, 2020, 8:07pm

And for UTF-8 code points, f32 is just as adequate as f64: f32 can represent 24-bit integers exactly, and UTF-8 code points can only go up to 21 bits.

Julius-Beides · November 28, 2020, 12:11pm

I'm working on a PR.

EDIT: Submitted #79502

Aloso · November 28, 2020, 1:04pm

I'd argue that char is not a number type, because it doesn't implement any arithmetic operations (in contrast to Java, where a char is just a two-byte unsigned integer).

char isn't in a text encoding, it's in a number encoding (big-endian or little-endian depending on the platform). Of course this is an implementation detail, but it's highly unlikely to ever change.

I think it makes sense to implement conversions between char and u16/u64/u128, not because it's the correct thing to do, but because it might prove useful, and I can't see any downsides.

mbrubeck · November 28, 2020, 4:44pm

It may be a fairly trivial encoding method, but it's certainly a text encoding. UTF-32 is a Unicode Encoding Form in which Unicode scalar values are encoded as 32-bit numbers, with big-endian and little-endian variations. The [char] type in Rust corresponds precisely to UTF-32BE or UTF-32LE, depending on platform.

Tom-Phinney · November 28, 2020, 5:57pm

What's the rationale for char: TryFrom<u16>? Is it just a glorified version of char: TryFrom<u8>, where the former might work for most non-Chinese languages while the latter works only for slightly-extended ASCII? Philosophically, should Rust standardize support for these opinionated uses, which intrinsically cannot be language-agnostic?

Ixrec · November 28, 2020, 9:41pm

FWIW, I actually have neither viewpoint. I thought From/Into were simply for infallible conversions where there's only one possible (or one obvious default) way of doing the conversion, such that no one could reasonably need to ask why this char value gets turned into that u32 value instead of some other u32 value. On that view, whether a char "is an integer" is simply irrelevant (or at least not directly relevant) to the question of whether these conversions should exist.

But I completely agree with this. In general, I think the only reason we shouldn't simply add every unambiguous From/Into impl we can is to avoid creating overlapping impls that no one can actually use (without UFCS), but these all seem pretty safe.

Admittedly, I'm not familiar enough with Unicode to be 100% confident that every char value fits in the positive i32s, but I think they do, and if they do then that impl's clearly fine.

Topic		Replies	Views
Why not impl<'a> From<&'a str> for &'a [u8]? libs	7	579	April 27, 2024
Arithmetic operations on char libs	5	1110	January 17, 2023
Should u64 implement From<usize>? libs	21	2897	March 26, 2020
`.into` for converting between Vec<u8>, Vec<char> and String language design	13	4089	November 29, 2022
Rust char and C char32_t ABI compatibility	3	812	March 25, 2019

Implement From<char> for u64

Related topics