[Pre-RFC] usize is not size_t

RalfJung · September 27, 2021, 8:23pm

I'm glad that you like it.

To be clear, this is not at all my idea.^^ @digama0 has been rooting for a language like that (no int-to-ptr casts) for quite a while, and such an operation comes up every now and then in Rust Zulip discussions -- I cannot remember when and where I first saw it.

The operation can already be implemented in current Rust, though it would probably make sense as an intrinsic given its relevance for the language semantics:

/// For general T, cast to u8 ptr and back.
fn ptr_from_int(addr: usize, provenance: *const u8) -> *const u8 {
  provenance.wrapping_add(addr.wrapping_sub(provenance as usize))
}

Also, unfortunately I think in Rust we cannot fully remove int-to-ptr casts, so sadly we cannot use this approach to avoid the nasty problems those casts create in the language semantics. But maybe we can at least move in a direction where those casts only exist as a legacy operation and their arcane semantics do not affect most programs (and hopefully these semantics are reasonably isolated... though that could be hard to achieve).

But indeed, I think it would be great if we could set Rust on a path towards a language without int-to-ptr casts, even if we can never complete that journey -- it will make me loose much less sleep over the nightmare that is that cast operation, and it will help Rust-on-CHERI.

CAD97 · September 27, 2021, 8:24pm

Question for @InfernoDeity: would this be sufficient to work on w65? I would think so... take the bank from provenance and offset within the bank from addr.

I didn't think so initially, but this is enough to implement e.g. pointer-union, which uses alignment tagging to union multiple types of pointer, as the pointer only ever actually has provenance to one object.

The one thing you couldn't do is e.g. split a pointer into separate bytes and hide them around in free bytes in a structure, but also, maybe, don't

InfernoDeity · September 27, 2021, 8:26pm

The issue is then addr is no longer an address. The bank information isn't magic shadow state represented in hardware, but is a meaningful part of a long address (which is what pointers in the proposed w65 psABI are - long addresses with an unused but zeroed-when-inbounds high byte).

InfernoDeity · September 27, 2021, 8:28pm

I technically do this in a couple places inside a support library, mainly for setting up things like DMA, where using a pointer type doesn't work because precise layout requirements (Also, I''m pretty sure the bus A absolute address and bus A bank are in separate parts of the DMA structure).

RalfJung · September 27, 2021, 8:32pm

Right, so ptr as usize would return the address within a bank, but would lose the bank information, if usize is too small.

One could imagine having an operation to extract "additional information" from pointers on platforms that have more than usize actual bits in their pointer, like the metadata API that was suggested above. This would be the bank in your case, and the capability in CHERI. But when and how would such an API be used or needed? Would it be sufficient if this information can be extracted but no way exists to "recombine" it?

Note that a metadata API cannot replace ptr_from_int, since on most targets the "metadata" has size zero and so it cannot carry any information, not even "shadow provenance" -- at least if we want to maintain the idea that "information stored in memory" is organized in bytes.

riking · September 27, 2021, 8:33pm

Sure, but addr here is the only part of the address that it's valid to perform math on. Adding 1 to the bank gets you an entirely separate part of memory that can't be said to be "adjacent".

We might imagine writing code like this:

let map_data = ptr_from_int(map_number * BYTES_PER_MAP, w65::ROM_BANK_3)

InfernoDeity · September 27, 2021, 8:35pm

Objects are allowed to be allocated contiguously accross banks. An extreme example would be the allocator api on an extended memory map, that spans the latter 3/4s of bank 7e, and all of bank 7f - you can get an allocation that crosses the two banks. My choice with the abi was not to treat the bank as a separate region, like with segmentation, but a first-class part of the address, and treat all addresses as linear.

riking · September 27, 2021, 8:36pm

Wait. doesn't that make objects larger than 16 bits in size legal? Therefore size_t should no longer be 16 bits?

InfernoDeity · September 27, 2021, 8:37pm

No, the maximum object size is 65535. However, those 65535 bytes need not be the same bank.

steffahn · September 27, 2021, 8:41pm

What std APIs would need to be changed?

What does “autocasting” mean to you? Implicit coercion? AFAICT, the thread above mostly discussed explicit coercion between pointers and usize, I don’t quite understand the value of implicit coercion between usize and uptr.

Also IMO the question of whether we eventually want warn-by-default is IMO a separate one from whether we want to introduce uptr at all, and can easily be discussed later. We already have things like no_std support that’s essentially an opt-in approach to supporting more platforms; there can be value in a uptr type without ever introducing any warn-by-default lints against pointer↔️usize coercions, so you can opt into supporting the “weird size_t != int_ptr_t” platforms by activating the lint and making sure your dependencies promise to do the same.

nico-abram · September 27, 2021, 9:43pm

I think the bit about autocasting is for backwards compatibility for code that uses APIs that use usize today, but should be using uptr in std/core

steffahn · September 27, 2021, 9:47pm

alright, that's why I'm asking: which API? I couldn't find any so far

talchas · September 27, 2021, 11:09pm

If this is "lossy" in the sense of "does not work" (as opposed to "loses optimizations"), that is a major break of at very least implied promises made by rust in the past. At very least when they were initially declared "pointer sized integer" I guarantee you that was interpreted to mean "usable to store pointers".

If it does work then CHERI either has to be special or usize needs to actually be pointer compatible (and maybe an edition can rename it to uptr and introduce a new usize if that bit of confusion is considered better than the other confusion options)

RalfJung · September 28, 2021, 1:33am

Hm, okay, so that is in a sense an even bigger violation of the Rust platform assumptions than CHERI. With CHERI, the difference between any two addresses can still be stored in a usize; in your ABI, that is not the case.

I am using "lossy" in the sense of "loses information": when you cast a pointer to an integer and back, the resulting pointer is not in all aspects identical to the one you started with. This is pretty much a necessary truth in languages like C, C++, or Rust, but sadly not widely known. Many compilers get this wrong and optimize int2ptr(ptr2int(ptr)) to just ptr; that optimization is buggy and my blog post explains why. "buggy" here means "these compilers will miscompile valid code'; here is an example for LLVM.

digama0 · September 28, 2021, 1:34am

I think that the only mechanism that has any hope of being compatible with existing code on regular platforms is type uptr = usize;. Coercion is not strong enough because these types can appear inside function pointer types and type arguments in Vec or other collection types; thanks to type inference it is quite possible for this type equality to be asserted without mentioning either usize or uptr directly, and coercion does not work in all cases.

InfernoDeity · September 28, 2021, 1:35am

Is there a way to observe this at all in rust, other than with a cast to usize? IIRC, there's no wrapping_offset_from.

RalfJung · September 28, 2021, 1:51am

AFAIK wrapping_offset_from does not exist because it would be redundant -- it is entirely equivalent to casting to usize and using wrapping_sub.

InfernoDeity · September 28, 2021, 1:53am

In that case, it wouldn't be an additional fundamental change to require that to go through uptr on w65 instead, given that the usize cast would become a hard error on this platform (assuming usize became size_t and new uptr type was introduced).

RalfJung · September 28, 2021, 1:55am

CHERI doesn't require a uptr type though. It only requires a replacement for int-to-ptr casts specifically.

So that is the sense in which CHERI violates fewer of Rust's assumptions, and hence is easier to accommodate, than w65.

talchas · September 28, 2021, 2:19am

In that post the "not identical" behavior is entirely compiler-internal and is only visible to users as lost optimizations (once the llvm bugs are fixed). Or if provenance is added to standards under "integers don't track provenance" then it might be visible as "you can dodge the rules by casting through integers". This is the exact opposite sort of lossy as one that would justify removing int-to-ptr as already broken.

Topic		Replies	Views
ABI discussion for w65 language design	17	2101	November 18, 2021
Pre-RFC: `usize` semantics Unsafe Code Guidelines	155	7421	June 5, 2024
CHERI pointers and Rust / LLVM SIMD language design	2	1166	January 4, 2022
To improve usize (and isize) handling in Rust language design	6	1376	September 19, 2020
Would having both `iptr/uptr` and `idiff/usize` in Rust be a good idea? (Answer: No.) bikeshed (deprecated)	5	2591	March 25, 2019

[Pre-RFC] usize is not size_t

Related topics