To be clear, this is not at all my idea.^^ @digama0 has been rooting for a language like that (no int-to-ptr casts) for quite a while, and such an operation comes up every now and then in Rust Zulip discussions -- I cannot remember when and where I first saw it.
The operation can already be implemented in current Rust, though it would probably make sense as an intrinsic given its relevance for the language semantics:
/// For general T, cast to u8 ptr and back.
fn ptr_from_int(addr: usize, provenance: *const u8) -> *const u8 {
provenance.wrapping_add(addr.wrapping_sub(provenance as usize))
}
Also, unfortunately I think in Rust we cannot fully remove int-to-ptr casts, so sadly we cannot use this approach to avoid the nasty problems those casts create in the language semantics. But maybe we can at least move in a direction where those casts only exist as a legacy operation and their arcane semantics do not affect most programs (and hopefully these semantics are reasonably isolated... though that could be hard to achieve).
But indeed, I think it would be great if we could set Rust on a path towards a language without int-to-ptr casts, even if we can never complete that journey -- it will make me loose much less sleep over the nightmare that is that cast operation, and it will help Rust-on-CHERI.
Question for @InfernoDeity: would this be sufficient to work on w65? I would think so... take the bank from provenance and offset within the bank from addr.
I didn't think so initially, but this is enough to implement e.g. pointer-union, which uses alignment tagging to union multiple types of pointer, as the pointer only ever actually has provenance to one object.
The one thing you couldn't do is e.g. split a pointer into separate bytes and hide them around in free bytes in a structure, but also, maybe, don't
The issue is then addr is no longer an address. The bank information isn't magic shadow state represented in hardware, but is a meaningful part of a long address (which is what pointers in the proposed w65 psABI are - long addresses with an unused but zeroed-when-inbounds high byte).
I technically do this in a couple places inside a support library, mainly for setting up things like DMA, where using a pointer type doesn't work because precise layout requirements (Also, I''m pretty sure the bus A absolute address and bus A bank are in separate parts of the DMA structure).
Right, so ptr as usize would return the address within a bank, but would lose the bank information, if usize is too small.
One could imagine having an operation to extract "additional information" from pointers on platforms that have more than usize actual bits in their pointer, like the metadata API that was suggested above. This would be the bank in your case, and the capability in CHERI. But when and how would such an API be used or needed? Would it be sufficient if this information can be extracted but no way exists to "recombine" it?
Note that a metadata API cannot replace ptr_from_int, since on most targets the "metadata" has size zero and so it cannot carry any information, not even "shadow provenance" -- at least if we want to maintain the idea that "information stored in memory" is organized in bytes.
Sure, but addr here is the only part of the address that it's valid to perform math on. Adding 1 to the bank gets you an entirely separate part of memory that can't be said to be "adjacent".
We might imagine writing code like this:
let map_data = ptr_from_int(map_number * BYTES_PER_MAP, w65::ROM_BANK_3)
Objects are allowed to be allocated contiguously accross banks. An extreme example would be the allocator api on an extended memory map, that spans the latter 3/4s of bank 7e, and all of bank 7f - you can get an allocation that crosses the two banks. My choice with the abi was not to treat the bank as a separate region, like with segmentation, but a first-class part of the address, and treat all addresses as linear.
What does âautocastingâ mean to you? Implicit coercion? AFAICT, the thread above mostly discussed explicit coercion between pointers and usize, I donât quite understand the value of implicit coercion between usize and uptr.
Also IMO the question of whether we eventually want warn-by-default is IMO a separate one from whether we want to introduce uptr at all, and can easily be discussed later. We already have things like no_std support thatâs essentially an opt-in approach to supporting more platforms; there can be value in a uptr type without ever introducing any warn-by-default lints against pointerâď¸usize coercions, so you can opt into supporting the âweird size_t != int_ptr_tâ platforms by activating the lint and making sure your dependencies promise to do the same.
If this is "lossy" in the sense of "does not work" (as opposed to "loses optimizations"), that is a major break of at very least implied promises made by rust in the past. At very least when they were initially declared "pointer sized integer" I guarantee you that was interpreted to mean "usable to store pointers".
If it does work then CHERI either has to be special or usize needs to actually be pointer compatible (and maybe an edition can rename it to uptr and introduce a new usize if that bit of confusion is considered better than the other confusion options)
Hm, okay, so that is in a sense an even bigger violation of the Rust platform assumptions than CHERI. With CHERI, the difference between any two addresses can still be stored in a usize; in your ABI, that is not the case.
I am using "lossy" in the sense of "loses information": when you cast a pointer to an integer and back, the resulting pointer is not in all aspects identical to the one you started with. This is pretty much a necessary truth in languages like C, C++, or Rust, but sadly not widely known. Many compilers get this wrong and optimize int2ptr(ptr2int(ptr)) to just ptr; that optimization is buggy and my blog post explains why. "buggy" here means "these compilers will miscompile valid code'; here is an example for LLVM.
I think that the only mechanism that has any hope of being compatible with existing code on regular platforms is type uptr = usize;. Coercion is not strong enough because these types can appear inside function pointer types and type arguments in Vec or other collection types; thanks to type inference it is quite possible for this type equality to be asserted without mentioning either usize or uptr directly, and coercion does not work in all cases.
In that case, it wouldn't be an additional fundamental change to require that to go through uptr on w65 instead, given that the usize cast would become a hard error on this platform (assuming usize became size_t and new uptr type was introduced).
In that post the "not identical" behavior is entirely compiler-internal and is only visible to users as lost optimizations (once the llvm bugs are fixed). Or if provenance is added to standards under "integers don't track provenance" then it might be visible as "you can dodge the rules by casting through integers". This is the exact opposite sort of lossy as one that would justify removing int-to-ptr as already broken.