Since people coming from non-Unicode-aware languages seem to be universally confused about what a "character" or a "byte" or a "code point" is, and how it's "just a number" or "just a byte" or "just a 16/32-bit int", I think perpetuating these myths is entirely the wrong and irresponsible thing to do. Based on this reasoning, the language might as well lack a char
type completely and just use u32
for code points everywhere instead. That doesn't help with correct string manipulation at all, however.
I find the "but the conversion is lossless" argument weak at best, vacuous at worst. There are many, many surjective or bijective conversions between types that could technically be defined without loss of information. This doesn't mean that their semantics need to match automatically.
When I'm developing a database model, I'm often using newtypes of larger integers (u32
…u128
) as primary keys, and so do many others. Does it mean that it would make sense to convert back and forth between a struct UserID(u32)
and a char
? Certainly not.
Rather than trying to use from
as a rowhammer, we should think about whether we should, even if we can. Conversions are the very signs of doing something where types don't match up exactly, and thus they need extra caution in the general case. Sometimes, they don't, but that's only when we are lucky.
I'm otherwise a big advocate of ensuring type-level interoperability by implementing as many of the std traits as possible, but only so long as their correctness is immediately obvious. In a context filled with misconceptions like Unicode, the right choice is to let the user ask him/herself the question: "am I doing it right?", instead of letting him/her pull the trigger on the footgun without even being aware of the existence of that gun.