The char
docs link to http://www.unicode.org/glossary/#unicode_scalar_value:
In other words, the ranges of integers 0 to D7FF16 and E00016 to 10FFFF16 inclusive.
So it definitely fits in an i32
, since i32::MAX
is 7FFFFFFF16.
The char
docs link to http://www.unicode.org/glossary/#unicode_scalar_value:
In other words, the ranges of integers 0 to D7FF16 and E00016 to 10FFFF16 inclusive.
So it definitely fits in an i32
, since i32::MAX
is 7FFFFFFF16.
Hardly the only issue with this crate.
I did a very very brief look over it when deciding to use it or not in the past, and found 5 obviously unsound functions in that time and filed: https://github.com/RustSec/advisory-db/blob/master/crates/ncurses/RUSTSEC-2019-0006.md — the authors opinion at the time was that it's okay because it's a thin wrapper.
I have almost no doubt that there are more issues too, but haven't had the time to look further.
Since people coming from non-Unicode-aware languages seem to be universally confused about what a "character" or a "byte" or a "code point" is, and how it's "just a number" or "just a byte" or "just a 16/32-bit int", I think perpetuating these myths is entirely the wrong and irresponsible thing to do. Based on this reasoning, the language might as well lack a char
type completely and just use u32
for code points everywhere instead. That doesn't help with correct string manipulation at all, however.
I find the "but the conversion is lossless" argument weak at best, vacuous at worst. There are many, many surjective or bijective conversions between types that could technically be defined without loss of information. This doesn't mean that their semantics need to match automatically.
When I'm developing a database model, I'm often using newtypes of larger integers (u32
…u128
) as primary keys, and so do many others. Does it mean that it would make sense to convert back and forth between a struct UserID(u32)
and a char
? Certainly not.
Rather than trying to use from
as a rowhammer, we should think about whether we should, even if we can. Conversions are the very signs of doing something where types don't match up exactly, and thus they need extra caution in the general case. Sometimes, they don't, but that's only when we are lucky.
I'm otherwise a big advocate of ensuring type-level interoperability by implementing as many of the std traits as possible, but only so long as their correctness is immediately obvious. In a context filled with misconceptions like Unicode, the right choice is to let the user ask him/herself the question: "am I doing it right?", instead of letting him/her pull the trigger on the footgun without even being aware of the existence of that gun.
The only use cases I can think of for treating a char
as a number are (a) to check whether a code point is within a particular range, and char
already implements PartialOrd
, and (b) to print the codepoint value, and u32
already implements From<char>
for that.
So I agree: I see no reasonable use case which would be made simpler by implementing other conversions. For niche corner cases such as the example given of interfacing with the ncurses library, the two-step conversion u32::from(' ').into()
is enough.
In fact, the type required by waddch
is not a normal C character, but a C character logical-ORed with video attributes. From the waddch
man page:
Video attributes can be combined with a character argument passed to addch() or related functions by logical-ORing them into the character.
So the use case in the original post is invalid in my view.
Similar to the reasoning in Add non-`unsafe` `get_mut` for `UnsafeCell`, I'd suggest that the absence of the infallible conversion implies that there isn't a correct one. The error message could reasonably point us to use try_into
.
I'm also sympathetic to the viewpoint that this particular conversion should be spelled something like
let n : u64 = 'a'.encode_utf32().into();
but I don't know if that's workable into a consistent API without deprecating some From
impls.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.