As of today there are no supported architectures with >64 bit pointers. There aren't any on the horizon and that doesn't seem likely to change in the foreseeable future.
At the time of the above thread there wasn't a way to have write architecture specific trait implementations. There now is. So we could implement From for u64 today, and if a 128 bit architecture is introduced in the future, the implementation can be made architecture specific.
Obviously this would have the consequence of causing code making the assumption that this conversion can be preformed lossless to not build on 128bit architectures. However I think this is preferable to what is happening now where this trait is not available. I don't think because the trait is unavailable people are actually coding to the assumption that a u64 might not hold a usize, instead people write try_into().unwarp() which would result in a failure at runtime or worse as u64 which would hide the bug all together.
There's a definition for 128-bit RISC-V. I'm not aware of specific products planned based on that definition, but it exists.
I don't, however, think we should make code on 64-bit platforms substantially harder on the basis of such hypothetical platforms. (We also, for instance, don't support 8-bit platforms, though such platforms exist.)
Could you provide more information or links about this?
That also raises the reasonable question of whether we should provide architecture-specific trait impls for 32-bit platforms. (I don't think we should.)
This also brings the "portability lint" to mind. Could we have a mechanism by which you can't rely on architecture-specific impls unless you declare that you will only run on architectures providing such impls?
Sure, you can annotate a declaration with:
#[cfg(not(target_pointer_width = "128"))] indicating that the code that is being annotated should only be compiled in if the pointer width is not 128 bits. Today this will have no effect as everything is <128, but if such a target were added the code would be excluded when building to that target.
Yes, agreed. In general we should not do that all over the place, because it would make it easy to write code which is accidentally architecture dependent. In this specific case, I think it makes sense because there just aren't 128 bit targets, and that assumption is likely already being made. That being said, if I am wrong and suddenly there is a mainstream 128 bit architecture, it might make sense to backtrack and remove the trait implementation in the next edition to eliminate the inconsistency.
I don't think we have any mechanism that would allow limiting trait impls by edition, or any other backwards-compatible means of eliminating a trait impl.
Could we have some mechanism for a module to opt into using such a trait, by way of an error-by-default lint that can be disabled for code that can safely ignore hypothetical future 128-bit platforms?
Edit: I've been informed that Rust's usize is uintptr_t, and notsize_t. I've assumed the inverse (it has "size" in the name! and bindgen and libc alias size_t to usize), so all my unsafe/FFI code is already broken on 128-bit Rust
The motivation there is in making address space literally global. This is different from handling >64-bit sizes of data. Emphasis mine:
A full 64-bit address space will last for 500 years if allocated at the rate of one gigabyte per second. We believe that 64 bits is enough "for all time" on a single computer, enough for a long time on a small network, and not enough for very long at all on the global network.”
Even 128-bit RISC-V limits byte offsets to 64-bit.
Rust is likely going to need to deal with 128-bit pointers or even 256-bit permission-tagged pointers for the CHERI architecture, but that affects size of uintptr_t, which is different from size_t.
There is a branch of rustc that supports CHERI, which has 128-bit wide pointers, such that usize is 128-bit wide.
Obviously this would have the consequence of causing code making the assumption that this conversion can be preformed lossless to not build on 128bit architectures.
That's the job of the portability lint. I think it is fine for the From implementations from usize to be gated on cfg(target_pointer_width), we gate many things in hardware-dependent ways in libcore already (libcore is target_os / target_env independent, but it is not target_arch independent).
(We also, for instance, don't support 8-bit platforms, though such platforms exist.)
AFAICT we can support without issues all 8-bit platforms with 16-bit pointes, which is pretty much most of them. I'm not sure it makes sense to program a target with 8-bit pointers in anything but assembly (if you happened to even have a stack there, your stack pointer would be 8-bits as well..).
Ah. I didn't realize that would affect usize. That sortof makes the use of usize awkward for lengths/indexing of Vec, String, etc.
It sounds like the correct solution is to not implement From and try to make this fact more widely known so that less code makes this assumption. (Hopefully this thread will help)
Currently usize is used for both things, a pointer-sized integer, and for indexing. As long as that continues to be the case, targets where both these sizes do not match like CHERI or RV128 will be awkward to use from Rust, but that's kind of an orthogonal problem to the one being discussed here (whether it makes sense to impl From<usize> for u64 for the targets for which that is correct).
While 64 bit addresses are sufficient to address any memory that can ever be constructed according to known physics, there are other practical reasons to consider longer addresses.
It does seem like we should at least have u128: From<usize>...
Surely, as per scottmcm's link, it would make sense for all platforms if usize were defined only as an array index? Pulling double duty as both uintptr_t and size_t is what leads to this being an issue, no?
Since usize is defined to be uintptr_t (not size_t, nor both [0]), the From<usize> for u64 impl is correct for all targets with < 128-bit pointers, and we can just provide it there.
Some people think that the current definition of usize is a mistake, but that's a different discussion than the one about whether the impl could be provided or not.
[0] It is used in Rust as both, but it is defined to be uintptr_t. For all targets for which uintptr_t != size_t, uintptr_t size is always greater than size_t, so using a larger integer for indexing is correct, even though it isn't optimal.
For what it's worth (not much), I am one of those who think this should not have been the case. Almost all non-byte Vecs would be perfectly fine with a u32 index. This also caused an issue in the rand project: reproducibility of usize samples across architectures.
But this is somewhat off-topic, so I'll stop there.
I think what we really need are "[ui]size8/16/32/64/128" types in libcore, that have max(K, USIZE_BITS) bits and can be converted from [ui]N and [ui]size.
Maybe we should branch this discussion into a different thread. I think it would be worth it to at least identify which current pains this is causing, and which constraints we have on solutions.
I don't have anything else to add, and don't see how changing the index type of Vec and slices is possible at this point, so it seems like a dead end. But if anyone wants ideas for a new language, make Vec templated over the index type (but defaulting to u32)?
usize32 would be 32-bit on 32-bit CPUs (and 16-bit CPUs), and if that's not enough then you should of course use usize64 which would be 64-bit on 32-bit CPUs, and so on.
Sorry, I confused min/max. But the same logic holds: software may break because types are not the same on different architectures. (Besides which the main reason to use u8 / u16 is to save memory in structs/slices; forcing these types to be 64-bits on common machines would hit the memory bandwidth hard in some use-cases.)
Do they really exist? So called “8-bit architectures” manipulate data in 8-bit units, but that doesn’t mean pointers are 8 bits. That would only leave 256 addressable bytes. For example Intel 8008 - Wikipedia says addresses are 14 bits on that platform.
Even the first semiconductor microcomputer chips, such as the Mostek 6502, which preceded Intel's 808x, had addressing larger than their 8-bit data path.
According to https://en.wikipedia.org/wiki/Atmel_AVR_instruction_set#Memory_addressing_instructions some AVR chips only have 256 bytes or less of "data address space" so the stack pointer register may only physically store 8 bits. However as far as I can tell usize would still be 16 bits on those chips because it’s still the same instruction set as on larger ones, and because "program ROM" is larger and also addressable. (Consider taking the address of a static item, or casting an fn() function pointer.)