Implement From<char> for u64

scottmcm · November 28, 2020, 9:45pm

The char docs link to http://www.unicode.org/glossary/#unicode_scalar_value:

In other words, the ranges of integers 0 to D7FF₁₆ and E000₁₆ to 10FFFF₁₆ inclusive.

So it definitely fits in an i32, since i32::MAX is 7FFFFFFF₁₆.

tcsc · November 29, 2020, 1:36am

Hardly the only issue with this crate.

I did a very very brief look over it when deciding to use it or not in the past, and found 5 obviously unsound functions in that time and filed: https://github.com/RustSec/advisory-db/blob/master/crates/ncurses/RUSTSEC-2019-0006.md — the authors opinion at the time was that it's okay because it's a thin wrapper.

I have almost no doubt that there are more issues too, but haven't had the time to look further.

H2CO3 · November 29, 2020, 7:13am

Since people coming from non-Unicode-aware languages seem to be universally confused about what a "character" or a "byte" or a "code point" is, and how it's "just a number" or "just a byte" or "just a 16/32-bit int", I think perpetuating these myths is entirely the wrong and irresponsible thing to do. Based on this reasoning, the language might as well lack a char type completely and just use u32 for code points everywhere instead. That doesn't help with correct string manipulation at all, however.

I find the "but the conversion is lossless" argument weak at best, vacuous at worst. There are many, many surjective or bijective conversions between types that could technically be defined without loss of information. This doesn't mean that their semantics need to match automatically.

When I'm developing a database model, I'm often using newtypes of larger integers (u32…u128) as primary keys, and so do many others. Does it mean that it would make sense to convert back and forth between a struct UserID(u32) and a char? Certainly not.

Rather than trying to use from as a rowhammer, we should think about whether we should, even if we can. Conversions are the very signs of doing something where types don't match up exactly, and thus they need extra caution in the general case. Sometimes, they don't, but that's only when we are lucky.

I'm otherwise a big advocate of ensuring type-level interoperability by implementing as many of the std traits as possible, but only so long as their correctness is immediately obvious. In a context filled with misconceptions like Unicode, the right choice is to let the user ask him/herself the question: "am I doing it right?", instead of letting him/her pull the trigger on the footgun without even being aware of the existence of that gun.

tspiteri · November 29, 2020, 10:02am

The only use cases I can think of for treating a char as a number are (a) to check whether a code point is within a particular range, and char already implements PartialOrd, and (b) to print the codepoint value, and u32 already implements From<char> for that.

So I agree: I see no reasonable use case which would be made simpler by implementing other conversions. For niche corner cases such as the example given of interfacing with the ncurses library, the two-step conversion u32::from(' ').into() is enough.

In fact, the type required by waddch is not a normal C character, but a C character logical-ORed with video attributes. From the waddch man page:

Video attributes can be combined with a character argument passed to addch() or related functions by logical-ORing them into the character.

So the use case in the original post is invalid in my view.

toc · November 29, 2020, 5:31pm

Similar to the reasoning in Add non-`unsafe` `get_mut` for `UnsafeCell`, I'd suggest that the absence of the infallible conversion implies that there isn't a correct one. The error message could reasonably point us to use try_into.

I'm also sympathetic to the viewpoint that this particular conversion should be spelled something like

let n : u64 = 'a'.encode_utf32().into();

but I don't know if that's workable into a consistent API without deprecating some From impls.

system · February 27, 2021, 5:31pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Proposal: integer conversion methods	27	3583	May 12, 2019
Allow use as and try as for From and TryFrom traits language design	37	2945	January 5, 2024
Implement Index<usize> for String and &str libs	48	6194	March 19, 2021
Numeric .into() should not require everyone to support 16-bit and 128-bit usize language design	19	3968	March 25, 2019
Arithmetic operations on char libs	5	1363	January 17, 2023

Implement From<char> for u64

Related topics