Pre-RFC: `usize` semantics

Perhaps this is too tangential or has been discussed elsewhere already, but here goes:

This does also raise another question (related to intptr_t in particular), which is how to do pointer tagging on CHERI. After all such techniques are a time honoured tradition in interpreters and JIT compilers in particular (though found elsewhere as well as I will discuss at the end).

I know of several variations of this:

  1. Using the least significant bits, relying on alignment to mean they are always zero for all pointers. Mask the off and you know what type your pointer points to (or if it is actually not a pointer but something else).
  2. Use the high bit, user space pointers on amd64 should always have the most significant bit cleared (otherwise it would be a kernel pointer). Not super-portable.
  3. Use the high bits, virtual address space is less than 64 bits (and physical is even less). Quite risky as future generations of CPUs are released and not very portable.
  4. I believe that V8 even goes all in on this and stuffs pointers into floats (counting on the address space being less than the 56 bit mantissa), this being referred to as NaN-stuffing. Really a variant on 3. Not super read up on the details of this.

I seem to remember seeing some libraries in Rust that do SSO (small string optimisation) have variants of this for example. And I have seen it in some other heavily optimised data types (there was a tutorial blog post on reddit a few weeks(?) ago about this, I pointed out that the particular implementation was unsound since it was broken on big endian, forgot to mention CHERI).

And the concept of niches in Rust is of course related, though currently the compiler isn't quite as inventive as humans in this area from my understanding. For example, references don't currently have alignment niches, just the zero niche as I understand it. Proper alignment niches would avoid a lot of need for unsafe.

All of this code (except compiler niches) will presumably fail miserably on CHERI. I would also guess that the V8 developers are unhappy giving up on the performance (and as much as I like Firefox, V8 is a Big Deal).

So what to do about all of this mess (keeping in mind I have always been a fan of bit twiddling and clever low level hacks)?

Using low bits, via the strict provenance APIs, should work perfectly fine on CHERI. That's, in fact, one of the original motivation for having these APIs.

5 Likes

You'd need to use with_addr (or map_addr, which is built atop with_addr) for your tagging-related bit-twiddling. In a CHERI implementation, this would preserve all the metadata associated with the pointer, while allowing you to change the address component of the pointer (and thus do the tagging dance).

These are both true: if we want to support those targets we have to allow for usize not simultaneously being size_t and being pointer-sized, but most programmers shouldn't have to deal with that (either because they aren't mixing those two concepts, or because they won't run on those targets).

There is a similar situation today with usize and u64 and u32. Today, it's reasonable to write let x: usize = some_u32.into();, but Rust doesn't provide that From impl, because Rust allows for targets where usize is 16 bits. However, most programmers shouldn't have to deal with that. That should be something that Rust defaults to allowing, and then code that wants to support such targets can opt-in to handling it.

I think, for both cases, we need a mechanism for "opt-in to supporting something beyond the normal Rust defaults". And the normal Rust defaults should be "size_t and pointers are the same size", because otherwise we'd break existing code relying on that assumption.

(Then, of course, there's the bikeshed of "what should usize mean on targets that require opting in to the non-default behavior". But I think there's an obvious answer to that: usize has size in the name, and it'd be incredibly confusing if it didn't match size_t.)

4 Likes

I remember there being talk about adding a formal mechanism for code to opt in to supporting experimental targets, is that what you have in mind? Did that discussion ever turn in to anything more concrete?

I don't think we even need something that heavy-handed for the initial experiment. Just a way to make a target nightly only, by requiring something like -Znightly-target --target $CHERI or so.

1 Like

Someone proposed implementing that a little while ago and it was rejected, so I'm not sure what the right way to do it would be: https://rust-lang.zulipchat.com/#narrow/stream/233931-t-compiler.2Fmajor-changes/topic/Add.20.60-Zexperimental-target.60.20compiler-team.23685/near/399124627

I think the right path forward here would be:

  • A mechanism for Rust to default to requiring particular properties of targets (such as "is either 32-bit or 64-bit", "size_of::<usize>() == size_of::<*const ()>()")
  • Allowing code to assume those properties by default (such as impl From<u32> for usize, impl From<usize> for u64)
  • Allowing code to explicitly opt into supporting targets where those properties don't hold ("this crate supports 16-bit targets", "this crate supports targets where size_of::<usize>() < size_of::<*const ()>()"), and then that code doesn't get to assume those properties.

I'm going to propose a project goal for this, but it will still need an owner to design that mechanism and propose an RFC.

5 Likes

I generally agree, but I would like to add the following:

  • The proposed mechanism should be general enough, e.g. it should allow the winapi/windows-* crates to specify that they can work only on Windows targets.
  • I think conditionally allowing From<u32> for usize impl looks somewhat similar to this proposal. It would be nice to have the same tooling for both these usecases.
  • We will need a rough outline of how the defaults would change in future editions. I don't think we should make the "not-CHERI" condition a default one, i.e. crates should have to explicitly opt out of CHERI support.
1 Like

Perhaps, but there's a difference between crates being able to opt-in to supporting a broader set of targets and crates being able to opt-out of supporting targets that are default-supported. I completely agree that both should exist, and perhaps they should use the same mechanism, but the defaults are important: there needs to be a difference between what Rust assumes crates support by default and what Rust assumes crates don't support by default.

I do think we should have a mechanism for "I only support Windows" or "I only support Linux" or "I only support 64-bit targets", though.

1 Like

I think not-CHERI would have to be the default for now, otherwise it would be a breaking change. That could be changed at an edition if CHERI ever becomes more than an interesting research project. Until it actually takes off and general consumers can buy such hardware (Cheri Pi anyone?) I don't see the point of burdening the whole ecosystem with it.

2 Likes

Those sound like nice ideas, but do they really have to block experimenting with CHERI on nightly?

1 Like

No, but it sounds like the mechanism for making targets nightly-only was rejected, which prevents experimenting with this only on nightly.

2 Likes

"not-CHERI" is already the default, insofar as no existing target breaks the properties that CHERI does, and many crates already assume those properties.

2 Likes

Yes, this is why I wrote that the proposal should outline how it envisions defaults and their evolution.

This is why I wrote about potentially doing it in future editions. Disabling CHERI support in Rust crates by default, despite most of them being written in safe Rust without any of the troublesome pointer arithmetic stuff, would immediately make impractical any hypothetical CHERI support in Rust compiler.

Even without CHERI being ready for consumer market we probably should guide Rust crates towards using strict provenance (when it's ready), which IIUC would help greatly with future CHERI support and will be useful even if CHERI will not gain adoption.

1 Like

I think any design for this feature would still need to allow CHERI users to try using a given crate even if it doesn't opt in, just as no_std users can try using a crate that doesn't advertise no_std support.

2 Likes

This is unfortunately weirdly Schizophrenic -- the argument seems to be that we don't need an unstable target flag because target support is already not covered by stability guarantees. But also we can't add a CHERI target because then it becomes available on stable. I feel it may be worth re-trying this with a more stringent line of argumentation. CHERI is not even mentioned in that proposal, so an MCP for "nightly-only CHERI support" would still be a reasonable request to the compiler team, I think.

I'm happy to help draft such a proposal.

5 Likes

I think there is a difference, in that most targets don't impact what is considered valid the way CHERI does, and the extent of target support instability mostly amounts to how much of core/alloc/std/tools are supported, not any fundamental properties. A potential direction that is consistent with both "targets aren't a part of the stable/nightly stability" and "CHERI shouldn't work on stable because of impacts on stable guarantees" could be that there's a feature for a uptr type which needs to be enabled in order to target CHERI.

Along those lines, during the experimental support of CHERI it probably wants a compiler flag (unstable) to switch between usize being uintptr_t or size_t. Both having the same arithmetic range means the difference should just be layout and whether (the PVI subset of) ptr2int2ptr works with usize.

But, ultimately, I do agree with you — either target support is exempt from normal stability and thus there's no issue with CHERI being a tier 3 target available to stable toolchains, or CHERI is too experimental to be allowed on stable and thus there are some parts of target support which participate in normal stability (even if most targets don't touch them) and an unstable target support flag is justified.

This is already pseudo-possible with #[cfg] compile_error!() and/or assert! in a buildscript looking at $CARGO_CFG_*, and it's good practice to use one of these if you know you have specific requirements.

But, absolutely yes, it would be wonderful to present this information in some structured way that cargo understands and can be used to enable target-specific impls.

The obvious way would be to specify a cfg() predicate that a crate officially supports, but I don't think we should introduce more reliance on SAT solving[1] if there are other options available. Also it doesn't seem nice to stuff an unbound syntax into cfg predicates for this.

Definitely a case of perfect-as-enemy-of-good, but if Rust grows support for "deferred" cfg selection in order to provide conditionally available std trait impls, it'd be ideal to design it in such a way that all crates get access to this weaker form of conditional compilation where item signatures are still known/checked even when not used.

bikeshed

It's less general, but I think a decent way to spell requirements could be [package.required_cfg] key = ["allowed", "values"] (any() semantics). For removing a default bound, specify a replacement one or = false to remove it entirely. A requirement to be unset would be = []. A requirement to be set without value is = true.

bikeshed: by analogy with cfg(target_has_atomic = "ptr"), I think the inverse of this relaxation could be spelled as perhaps cfg(target_addr_width = "ptr").

I think the set of reasonable default bounds would be:

  • target_addr_width = "ptr" (usize is both size_t and uintptr_t)
  • target_pointer_width = ["32", "64"] (32-bit or 64-bit)
  • target_has_std = "full" (target has full std support)

target_has_atomic really wants for an all() semantic, which I don't provide for requiring in package metadata. Unfortunately I don't know a good solution — the obvious one would be to specify the key multiple times, but that's forbidden by the TOML format itself (for good reason).


Crazy idea — a way to test "usize isn't uintptr_t" support without full CHERI emulation could be to, instead of making pointers bigger, make usize smaller instead. An x86_64 target with 64-bit pointers but 32-bit usize would likely be an abomination, but perhaps also a useful one. It's already not uncommon to use u32 for optimized sizes (e.g. for 16-byte String[2] instead of 24) and FFI could still work if code uses c_size_tâ€Ļ :thinking:


  1. e.g. "given this crate's cfg predicate, does it always hold that cfg(target_pointer_width = "64")? ↩ī¸Ž

  2. Unfortunately for std to actually provide such, either the alignment of pointers would need to be lowered to 4 or std'd need to split the RawVec up to avoid RawVec still being padded out to 16 bytes. CHERI also wants this to keep Vec at only 32 bytes instead of 64; this is yet another way that code soft assumes usize is ptr sized. ↩ī¸Ž

3 Likes

Probably less useful than you think, since x32 was a thing but never gained any popularity. Granted that had 32-bit pointers (but ran in Long Mode, so it could use more registers, RIP relatitive addressing and other feature not available in 32-bit mode). I doubt making the pointers 64-bit but keeping x32 otherwise would help. And wouldn't the kernel need to be in the know to make it work, just like with x32 (unless you do a bare metal target)?

So yeah, probably only useful for erzats CHERI experiments.

1 Like

As a user/contributor to both mediumvec and string32, I'd be interested in this as well. For some of my memory-hungry processes I debate whether or not to refactor all of my Vec's and String's. The payoff isn't immediately obvious until after I've committed to the refactor, so a technique to quickly estimate with a -m32-like-flag is intriguing. (and of course has the added benefit of facilitating some CHERI experiments)

2 Likes