[Pre-RFC] usize is not size_t

josh · September 24, 2021, 12:52am

I think we have only three reasonable options here:

Decide not to support platforms with sizeof(size_t) != sizeof(uintptr_t). I don't think we should do this; CHERI seems useful to support.
Define usize to be size_t. Error on casts from pointer types to usize.
Define usize to be uintptr_t.

InfernoDeity · September 24, 2021, 1:08am

2 is probably ruled out, at least in edition<=2021, because it would be a huge breaking change. I can imagine that any kind of crater run would not go well.

3 would be status quo, as mentioned, though adding the anti-guarantee of usize not also being size_t would be status-quo + this proposal. In my opinion, this is the most realistic best case scenario.

I'm not a huge fan of 1, given my obviously biased perspective on it (I want to be able to code SNES games in Rust, but I also want SNES-Dev to be fast, and useful).

nacaclanga · September 24, 2021, 7:58am

We could go for option 2 by limiting the hard error to targets, where size_t is not uintptr_t and only issue a warning "casting pointers to usize is not supported on all targets" otherwise. This would however mean, that these targets do only have partial Rust support. Technically this wouldn't be to different to say attempting to compile a crate using "std" to a target not supporting it. I would also introduce a new primitive integer type representing uintptr_t in that case to provide an easy migration path for the casts.

bjorn3 · September 24, 2021, 8:28am

Declaring usize as being smaller than a pointer will break core::ptr::align_offset and a lot of (function) pointer trait impls. It would also break pointer tagging implementations.

kornel · September 24, 2021, 9:23am

In case of CHERI it won't break because of the size difference, because the pointers are larger due to extra metadata, rather than larger address space.

CHERI's pointers are more like Rust fat pointers. Maybe they could be modelled as such?

kornel · September 24, 2021, 9:31am

Collection offset and pointer size can be two different things, and Rust has only one type for both. Without adding a second type, whatever you choose for usize will be inaccurate half of the time.

So I think the choices are:

Define usize to be uintptr_t and add a type for size_t.
Define usize to be size_t and add a type for uintptr_t.

In Rust usize is very common and visible due to collection indexing done by usize. Changing it would be an enormous churn.

Having usize too large for size_t is a big language-wide overhead. It forces 128-bit arithmetic on platforms with 64-bit address space.

OTOH unsafe smuggling of pointers in integers is much less frequent. Such code would have to be reviewed anyway for platforms like CHERI where the integer value is not simply an address. Adding a new type for this would be an opportunity to make things like the tag explicit.

InfernoDeity · September 24, 2021, 10:09am

True. It also forces 32-bit arithmetic on w65, which requires software arithmetic (and multiplication especially - you'd never want to multiply usize if you can help it).

CAD97 · September 24, 2021, 2:25pm

I do think that the only approach that manifests benefit of uintptr_t != size_t is to have usize == size_t and an error on such platforms for casting pointers to usize.

The goal of @InfernoDeity is that on w65, memory locations are 32 bit, but object sizes and offsets are 16 bit. Sure, Rust could target such a platform if usize == uintptr_t, but usize is the indexing type in Rust, which means that the actual w65 benefit of not making the two types the same width the same size is completely lost in Rust, and they might as well be the same size.

That's the conflict here: Rust's core library design assumes that usize is sufficient both for size_t and uintptr_t.

Our one out (imho) is that while ptr as usize is fairly common, it should be niche. Or IOW, it's common because widely used libraries do it, not because everyone does it.

So imho the best way forward to optionally separate uintptr_t from size_t in Rust is

Admit that due to the existing practice and design of Rust, such platforms are considered "exotic" and will often not "just work" with any library doing pointer tricks,
Do our best to make such issues caught at compile time, thus
Declare that usize is size_t,
Introduce a new type (e.g. uaddr) which is declared to be uintptr_t,
When as casting from pointer to usize on such platforms, emit an error-by-default lint,
Provide the lint as allow-by-default on other platforms for the purpose of portability,
Declare that, at least for the time being, std assumes a standard platform where usize and uaddr are equivalent, and
Audit core for any library assumptions that usize is uaddr, and either fix them (if internal) or somehow make them loudly complain on platforms where this isn't the case.

The reasoning behind taking this angle is that while ptr as usize is required and endorsed by the language today, the use of usize as the indexing type is much more widespread.

H2CO3 · September 24, 2021, 3:28pm

Is it important enough to break everything else with the reasonable assumption of status quo? I highly doubt that we should be optimizing for niche platforms at a specifically high cost incurred upon the major ones.

Tom-Phinney · September 24, 2021, 4:07pm

I don't believe that anyone is proposing "optimizing for niche platforms". Rather the goal is to find a way to not preclude niche platforms while minimizing the impact on existing code.

josh · September 24, 2021, 4:31pm

That's a really interesting idea. What might that look like?

H2CO3 · September 24, 2021, 7:28pm

To be clear: It is my claim that retrospectively breaking fundamental assumptions on major platforms for the sake of niche platforms counts as "optimizing" for the latter — not in the performance engineering sense, but in the sense that it steers the general mindset towards dangerous design decisions.

jrtc27 · September 25, 2021, 9:27pm

CAD97:

I do think that the only approach that manifests benefit of uintptr_t != size_t is to have usize == size_t and an error on such platforms for casting pointers to usize.

The goal of @InfernoDeity is that on w65, memory locations are 32 bit, but object sizes and offsets are 16 bit. Sure, Rust could target such a platform if usize == uintptr_t, but usize is the indexing type in Rust, which means that the actual w65 benefit of not making the two types the same width the same size is completely lost in Rust, and they might as well be the same size.

That's the conflict here: Rust's core library design assumes that usize is sufficient both for size_t and uintptr_t.

Our one out (imho) is that while ptr as usize is fairly common, it should be niche. Or IOW, it's common because widely used libraries do it, not because everyone does it.

So imho the best way forward to optionally separate uintptr_t from size_t in Rust is

Admit that due to the existing practice and design of Rust, such platforms are considered "exotic" and will often not "just work" with any library doing pointer tricks,

Do our best to make such issues caught at compile time, thus

Declare that usize is size_t,

Introduce a new type (e.g. uaddr) which is declared to be uintptr_t,

When as casting from pointer to usize on such platforms, emit an error-by-default lint,

Provide the lint as allow-by-default on other platforms for the purpose of portability,

Declare that, at least for the time being, std assumes a standard platform where usize and uaddr are equivalent, and

Audit core for any library assumptions that usize is uaddr, and either fix them (if internal) or somehow make them loudly complain on platforms where this isn't the case.

The reasoning behind taking this angle is that while ptr as usize is required and endorsed by the language today, the use of usize as the indexing type is much more widespread.

Please don't call it uaddr, rather something like uptr. CHERI capabilities (for 64-bit architectures) have a 64-bit integer part that is called the address (technically it's not always an address, so Arm's Morello calls it the value which avoids that slight abuse of terminology, but that's overall worse because the value should be the whole 128-bit (plus tag) quantity), and CHERI C/C++ defines a ptraddr_t type that is an integer big enough for any virtual address, i.e. a 64-bit integer on 64-bit targets (you could argue that we should just use size_t, but technically size_t only needs to be as big as the largest contiguous allocation you support, and we wanted to ensure the language extension was as general as possible rather than introducing a new conflation of types).

InfernoDeity · September 25, 2021, 9:32pm

I also agree. uaddr can be ambiguous as to what kind of address it is when multiple exists. For example: on real mode x86, is it a logical address, or an offset in segment? The same applies to w65 - is it an absolute address (16-bit) or a long address (24-bit, possibly extended to 32-bit, which is what the pointers in the abi are).

jrtc27 · September 25, 2021, 9:34pm

Nobody's even remotely considering that existing editions have usize retroactively changed for existing targets, that's clearly not a good idea. But that doesn't mean that, if the amount of code affected is small and/or the upgrade path can be mostly automated, future editions can't provide a cleaner separation of types even on traditional targets. The current conflation of types, whilst technically correct, does seem a little at odds with Rust's philosophy of a powerful type system, and so I would argue that pushing for a new uptr everywhere would actually improve the Rust ecosystem even without CHERI (and maybe the distinction would even be beneficial for compiler optimisations due to less ambiguity over pointer provenance, if that's also an issue for Rust like it is for traditional C where uintptr_t is just a typedef for a plain int?).

InfernoDeity · September 25, 2021, 9:43pm

Indeed. If you are targeting a platform that has uintptr_t is size_t, the proposal is not that you shouldn't use it - but that it isn't guaranteed to hold in general, for all targets, so if you are writing portable code it needs to keep in mind the distinction (and then use something like c_size_t, or the actual split of usize into usize and uptr).

steffahn · September 26, 2021, 12:41am

I've seen editions mentioned at least three times in this thread. I fail to see how this is something where editions can help in any way shape or form. Editions are a mechanism of changing the meaning of syntax in a semver-compatible manner, nothing more. The proposal to introduce another type besides usize has nothing to do with syntax at all.

CAD97 · September 26, 2021, 12:54am

In general I agree, but two minor asterisks on that agreement:

Editions also (can) serve as a pseudo lint group, changing what lints are enabled by default. If in some future we lint against ptr as usize (something something portability lint), it might make sense that the lint is only turned on by default in a new edition (due to large fallout).
Any proposal that changes the behavior of as w.r.t. pointers is a language change, and that would need to happen over an edition, with the one exception of that if it only impacts as involving a new type (e.g. uptr) or a new platform (i.e. w65), as those would not be changing, just extending.

But yes, I agree that editions needn't be involved (except for the above two nits).

Aloso · September 26, 2021, 2:40pm

The only problem is that your crate's dependencies can be on a different edition than your crate. So if you migrate to a new edition that enables a deny-by-default lint, your code is still not portable because dependencies that haven't yet migrated don't have the lint and might still use ptr as usize.

I do think that there should be a lint. It could be warn-by-default or even allow-by-default at first. In new editions it could be upgraded to a deny-by-default lint and then to a hard error. But in the future when targets such as w65 and CHERI are supported, ptr as usize will be a bug, so it should become an error on previous editions as well at some point.

Regarding the uintptr_t type: Is that type really needed? What can you do with a pointer-sized integer that you can't achieve with a pointer?

mbrubeck · September 26, 2021, 3:25pm

One use case is packing metadata into unused bits of the pointer, as in smallbitvec.

Topic		Replies	Views
ABI discussion for w65 language design	17	2101	November 18, 2021
Pre-RFC: `usize` semantics Unsafe Code Guidelines	155	7418	June 5, 2024
CHERI pointers and Rust / LLVM SIMD language design	2	1166	January 4, 2022
To improve usize (and isize) handling in Rust language design	6	1374	September 19, 2020
Would having both `iptr/uptr` and `idiff/usize` in Rust be a good idea? (Answer: No.) bikeshed (deprecated)	5	2591	March 25, 2019

[Pre-RFC] usize is not size_t

Related topics