Numeric .into() should not require everyone to support 16-bit and 128-bit usize

For conversion between numeric types the current policy is that From must never fail, for any platform, not just on the target platform. It’s a nice idea, and I think it’s beneficial for portability between 32-bit and 64-bit platforms.

However, I think it goes too far in requiring support for 16-bit and hypothetical 128-bit platforms as well, because this has a cost of promoting dangerous use of as for conversions that are common on 32-bit and 64-bit platforms.

IMHO the policy doesn’t help portability, because as does not avoid the problem Into was meant to avoid, it only hides these types of bugs from the compiler. I find as to be C-like dangerous, and I would prefer to never use as in my code.

It causes broken code on 16-bit

For example, if I’m not allowed to use u32.into() usize, then I’ll have to use u32 as usize instead, even if I require conversion to never truncate. Lack of this into() implementation does not make my code work on 16-bit platforms. It makes Rust compile code designed with invalid assumptions without a warning.

I think errors about missing .into() implementations would help supporting 16-bit platforms — they’d prevent broken programs from running and corrupting data, and make the compiler point out places where the code needs to be fixed first.

Prevents libraries from changing exposed types

The same risk that applies for conversions between usize on various platforms, also applies to conversions between program’s types and libraries’ types.

I may be forced to use as casts that are not truncating in the current version of a library, but when the library changes sizes of its types, the as cast will continue to compile, but it’ll start silently corrupting data. When the library is an FFI, it becomes a safety problem.

OTOH if I could use .into() everywhere, then such incompatible change would be caught by the compiler (and only if my usage was actually incompatible).

I’m forced to support a platform that doesn’t even exist

Lack of conversion from usize to u64 causes problems for me on real platforms today. I’m unhappy about having to use as that is currently dangerous and not future-proof, only because .into() is meant to seamlessly support hypothetical future platform.

I prefer to fix compile errors when porting code to a new platform, than have to use dangerous (and non-portable) casts on platforms that I currently use.

Proposal

  • Add conversion from usize to u64 on all current platforms. Leave it unimplemented on a future 128-bit platform.
  • Add conversion from u32 to usize on 32-bit and wider platforms. Leave it unimplemented on 16-bit platforms.

This is limited and keeps the current requirement that code must compile on both 32-bit and 64-bit platforms.

Alternative

Discussion Pre-RFC: a vision for platform/architecture/configuration-specific APIs mentions “compatibility targets”.

  • Implement all conversions that are lossless for all compatibility targets (and don’t restrict conversions because they’re lossy on platforms that aren’t the targets).
9 Likes

I am confused as to why as is apparently the only alternative to .into() that exists.

Does this really need to be in the standard library though? as will always be available. If you’re conscientious enough to forgo using as when it’s problematic, what’s stopping you from doing something like this and enforcing it in code reviews?

https://is.gd/jHv0k6

trait To<T: Copy> {
    fn to(self) -> T;
}

impl To<usize> for u64 {
    fn to(self) -> usize {
        self as usize
    }
}

impl To<usize> for u32 {
    fn to(self) -> usize {
        self as usize
    }
}

fn main() {
    let x = 80u64;
    let y: usize = x.to();

    // compiler error
    // let c: u64 = z.to();
    
    let a = 80u32;
    let z: usize = a.to();
    
    // compiler error
    // let b: u32 = z.to();
    
    println!("{} {}", x, y);
    println!("{} {}", a, z);
}

It’s not like adding the conversions to std will prevent users from using as, so I don’t see a reason to increase the API service when it’s possible to easily write a tiny module (or a tiny crate if you want to use it everywhere) for “safe” integer conversions.

The one disadvantage here is that you don’t have Into b/c of the orphan rule (I think?). But that’s not IMO that big a deal if you’re already going to rely on a human layer of verification (or a clippy lint) to make sure that as isn’t used.

Perhaps the new TryFrom and TryInto will suit you. If you blindly unwrap those results, I’ll bet the optimizer can clean up the codegen where they are actually infallible, otherwise you’ll get a panic if an oddball usize converts out of range.

Having something in the std library is different, people are encouraged to use it. Everyone uses the same standard thing. You see tiny examples of Rust code on all kind of sites that use it (because it's standard, and available). You can even add a Rustc warning that suggests to use the library feature.

as has a benefit of having shorter, simpler syntax, working in many places where traits don’t, and being core part of the language. So while perhaps it’s not the only alternative, I think it’s fair to treat it as the strongest contender.

.try_into().unwrap() sort of works, but not quite:

  • it’s a run-time check, not a compile-time check. If the conversion unexpectedly becomes lossy (e.g. a library interface changes), the compiler won’t warn you about the breakage.

  • .into() syntax is already second-class compared to as, and .try_into().unwrap() is even more verbose. Explicit integer conversions are needed sometimes even multiple times in a single expression, so verbosity matters here a lot. It’d be better if TryInto had a syntax sugar, e.g. as!.

As I implied in the user thread, I don’t think From / Into should be implemented for different numerics on different platforms. My own experience is that I’m usually dealing with values significantly lower than even u16::MAX but I need to cast to usize to/from work with array indexing. I don’t want people in a similar situation thinking they should use into because its safer, and then having a library that doesn’t compile on e.g. 32 bit platforms because they cast a u64 to a usize.

1 Like

But where does the u64 come from? If it’s coming from e.g. filesystem API, with as it’s going to screw things up on 4GB files. OTOH with into() it’d fail to compile which is good, because then the 4GB file corruption would be immediately found and fixed.

E.g. someone selected u64 to store a value which cannot in practice be greater than 2^12 or something.

I find it odd that someone would chose to use u64 to store 12-bit values. If values are known to never be larger than that, then why not chose a smaller type? If values are only typically that small, but there’s no hard limit, then as is still dangerous.

Either way, using as in such case doesn’t allow the programmer to express the difference between intentional truncation and conversion that is expected to be always lossless.

But you don't win much if it doesn't compile, because how do you fix it? There are two things you can do:

  • Not using into(). This is what you have to do now.
  • Don't support the platform. This is probably not a real option.

So, fixing it basically means returning to the current situation.

Knowing about the problem is necessary to fix it. If you use as and you don't know when it produces unexpected values, then you can't even begin to fix it. In some cases it may be fixable (e.g. switch other parts of the program to same-sized types rather than converting back and forth).

Note that the problem is not limited to cross-platform code, but also interaction with libraries and FFI that may change their type aliases. In that case it's fixable when you upgrade the library, you just need to know about the bug.

Even when the problem is not fixable, knowing about it may let you add better error handling, e.g. display "file too large" error when user attempts to open the file, rather than silently truncate the data or panic the whole program (and lose unsaved data).

Not supporting 16-bit is a very real option. It's cool when Rust is portable, but real, meaningful 16-bit support always requires much more than just having .into() usize work. You need to rearchitect the whole program (e.g. you can't use 64KB+ lookup tables, you must stream data rather than buffer whole, etc).

The program I'm writing currently has a fixed overhead of at least hundreds of KB, and in practice needs several MB of RAM to do any useful work. I can't even imagine porting it to 16-bit.

And I don't want my program to have less safety guarantees on 32/64-bit platforms it's aimed for, only because Rust wants to compile it to broken garbage on 16-bit — without warnings.

1 Like

Not sure if my comment is relevant for the overall discussion since I didn't read it fully, but I've seen examples of what you find odd with a valid reason on some 32b platforms (which might apply to 64b platforms).

If a 32b CPU doesn't have natural support for 16b operations, the compiler needs to insert extra masking and bit shifting when operating with 16b to maintain correctness. And in those cases, you could save a bunch of instructions by operating in the natural word of the CPU.

1 Like

Even supposing someone has made the wrong choice by using a u64 to store small values, it seems preferable to me that their library compile on 32bit platforms than that it not. I also agree with @troplin that this is just going to compel people to go back to using as.

I also feel like the problem of lossy conversion is reasonably well known? It seems sort of intuitive to me that if you have more than 32 bits of data in a u64, converting it to a u32 will lose information.

As I've tried to explain, there are cases where it's unknown and unknowable.

In code such as my_variable as some_library::opaque_type it's not known from the code itself whether it'll lose information or not. Even if I go to the library's source and check it, it's still not guaranteed to be correct, because a future version of the library can change the type, and the incompatible change won't be caught by the compiler.

OTOH let x:some_library::opaque_type = my_variable.into() does encode my assumption that the types are compatible, and the compiler will catch the bug as soon as my assumption breaks. I'd love to use this form all the time everywhere, but I can't, because Rust has an arbitrary limit here.

I am definitely sympathetic. I don’t think we’ve satisfactorily resolved the “how to convert between integer types” questions yet. I’ve certainly found that when coding I wind up turning to as because there isn’t a clear right answer – or at least if there is I’m not aware of it (which sometimes happens with library conventions that I didn’t learn yet).

I also agree that we can’t expect just about anything to be truly portable to all platforms – this seems to be the consensus we’ve been working towards in terms of portability. Basically having the expectation that standard library options that are readily accessible work on “major” platforms, but that for more niche setups you may find things are missing (and maybe we want lints etc to detect these problems faster).

This is the convention I was referring to: Pre-RFC: a vision for platform/architecture/configuration-specific APIs

1 Like

Thanks! That discussion sounds very promising, especially the concept of “compatibility targets” seems like a good way to configure which into() conversions are allowed without throwing 16-bit under the bus :slight_smile:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.