Pre-RFC: `usize` semantics

talchas · May 24, 2024, 10:10pm

AIUI if you copy a pointer bytewise or even all at once to a non-aligned destination you won't preserve the hidden validity bit, and it will no longer be usable (even once it's all in a valid final destination).

How about no.

And now I'm done being polite about CHERI people trying to make their goals and compatibility the entire rust community's problem for the next month or two. Yet again.

newpavlov · May 24, 2024, 10:26pm

If you copy data back to T, then it's effectively ptr2int2ptr casting with extra steps. Such code will have all the provenance issues associated with such casting (and thus why it does not work as-is on CHERI). Relying on it should certainly be heavily discouraged.

Vorpal · May 24, 2024, 11:02pm

This is all assuming that cheri catches on and that there are good stable alternative APIs with good developer ergonomics.

Neither of those are true yet. And they are so far off in the future that there is really no point in planning for that yet.

The first step would be to test out the concept to see if it can carry it's weight. This involves seeing how many actual bugs this would help prevent (and not use security bugs, safety bugs are arguably more important in some industries, such as my day job where loss of human life by heavy equipment is a possible worst outcome, but everything runs airgapped making security issues much less of a concern).

But it also involves seeing if the developer ergonomics and performance is good enough for this to catch on (everything involves tradeoffs, and if you can't get those right, "more secure" as a selling point won't help: a brick is more secure, yet people use computers and connect them to the Internet). And of course general hardware availability is going to be a key factor. CHERI is only relevant in academia at this point.

Only when and more importantly if that has been achieved is it worth planning for the demise of the rest of the computing landscape. Until then, this is just pointless arguing and getting angry at each other (from both sides).

It would be better to focus at the actionable next steps: how do we get to an experimental target where questions such as size of usize, provenance models etc can be explored.

newpavlov · May 24, 2024, 11:22pm

Sigh... I wonder why people keep trying to tie this discussion to CHERI success time and time again. As if, there is absolutely no need for strict provenance without existence of such targets.

I pointed to it several times: strict provenance is important even without CHERI. Even if it did not exist, a proper provenance model is important for correct optimizations (see the Ralf's blog posts on it) and will be useful for tools like MIRI which track spatial and temporal memory safety.

In my opinion, actionable steps can look like this:

Develop and stabilize strict provenance APIs (including a model for dealing with the "master pointers" mentioned above).
Develop and stabilize strict provenance support in MIRI.
Encourage crates to migrate to strict provenance API. For example, by emitting warnings on raw casts.
Add CHERI experimental targets with panicking raw casts.
Require explicit opt-in for raw casts in a new edition, if the crates ecosystem has adapted the strict provenance widely enough. Eventually, maybe remove it entirely in an edition farther in future.

talchas · May 24, 2024, 11:56pm

There is in fact no need for strict provenance except on such targets, and frankly saying otherwise is just blatant lies.

There are uses for it otherwise, and it's reasonable to say that it should be the default, but saying that the expose operations should probably be completely removed in the forseeable future, in a thread primarily about CHERI support... you're not being subtle here.

And a "proper provenance model" does not preclude something in the expose space (see you know, C, which is developing one which supports it, and SB/TB which are being developed to support it, and escape analysis being a fundamental part of an optimizing compiler).

No, in rust MaybeUninit<u8> explicitly is supposed to be able to maintain provenance (without exposing or removing it) as an arbitrary opaque byte (rather than "integer of the smallest addressable size") type. The same issues come up in C around char and char arrays being able to hold anything. I'd expect this sort of spec violation to just be irrelevant most of the time, just like in C. On the one hand in rust it's more common to do generic stuff where you want it; on the other rust has more proper support for doing generic stuff so you're less likely to need to just shove unknown data in char arrays.

newpavlov · May 25, 2024, 12:35am

IIUC you reference RFC 3559. Firstly, it does not explicitly state whether provenance is preserved when [MaybeUninit<u8>; N] casted from a pointer-containing T is copied into an unaligned destination. Secondly, this part may be amended in future. The RFC discussion mentions CHERI only in passing, so I believe it was just an oversight.

Suuuure... I just will leave this link:

Fell free to disagree with it or interpret somehow differently. The discussion goes in circles and it looks like you are set in your (seemingly, C/C++ inspired) views, so it's likely my last comment in this thread.

talchas · May 25, 2024, 2:17am

Please actually read things you claim are supporting your position. That blog post is specifically working in a world where ptr2int2ptr roundtrips exist. (If they don't exist, then the problems it's discussing become much easier, after all!)

If your idea is that these problems are clearly unsolvable (at least without significant performance loss) and we need to get rid of it, that's an opinion but it isn't what that blog post says or a consensus that has developed in the years since.

fintelia · May 25, 2024, 6:41pm

If CHERI picks usize == u64 then the size of usize won't be the problem because the vast majority of Rust code runs on 64-bit systems. My point was that if CHERI picked usize == u128 then it would be the first target with that configuration, and thus the size would be an additional problem that caused some existing code not to work.

RalfJung · May 26, 2024, 8:04am

I am one of the people pushing for strict provenance and wrote a good part of its current documentation (and the RFC mentioned above). But even I don't think we should deprecate plain old as casts any time soon. Discourage, yes. Educate people about better alternatives where possible, yes. But it's way too early to consider deprecation.

Aside, with_exposed_provenance/expose_provenance behave exactly like as casts, so deprecating just the as casts achieves very little. And these methods are not going anywhere.

Please don't derail this discussion of CHERI with proposals such as this. It is entirely unnecessary to deprecate the as casts on all targets just to do some experimentation with CHERI. Entangling the discussions of how to make strict provenance more widely used and how to get started with CHERI experiments achieves nothing except getting people upset about CHERI. It is technically unnecessary and burns social capital for absolutely no gain.

newpavlov · May 26, 2024, 2:44pm

Note that I did not propose any concrete timeline for removing raw casts/ with_exposed_provenance/expose_provenance (here and before I call 3 of them as "raw casts" for brevity since it's effectively the same thing). In the list above, I proposed to require explicit opt-in (not even removal!) only after most of the ecosystem has migrated to strict provenance and it was decided that remaining usecases are minor enough or can be covered by available API (i.e. remaining crates do not want to migrate only because of convenience).

But I strongly believe that goal of the Rust language should be eventual removal of raw casts in a new edition. We may not achieve this goal for various reasons, but at the very least, the end state should be one of the heavily discouraged use. Also, as long as Rust supports old editions, raw casts technically still will be part of the language, so I do not propose "full" removal.

In the blog post you discuss the problems which raw casts cause for correctness of the language. You argue against the position "pointers are just integers" in language discussions, even if for hardware they are. You propose a way out of this mess. The documentation of with_exposed_provenance/expose_provenance discusses various complexities and vagueness which the associated "guessing" brings to the language specification table. Are those not reasons to strive to eliminate raw casts first from being used in the ecosystem, and then eventually from the language?

This situation looks to me suspiciously similar to the aliasing issues dealt by C/C++. Because they did not have explicit annotations for shared and exclusive pointers/references (yes, there is const now, but it's hardly used, as can be seen by the plethora of bugs uncovered by Rust), compilers have to "guess" a lot during optimizations. It adds a lot of complexity to (optimizing) compilers and increases chances of miscomiplation bugs. And even after that there is a certain amount of performance still left on the table, since compilers have to be conservative. Obviously, adding Rust-like annotations and changing the defaults is not a practical option for those languages.

But, luckily, Rust is in a different situation with provenance! Only a very small portion of the ecosystem relies on raw casts and most of it can be migrated to strict provenance relatively easy. In other words, eventual removal of raw casts could be practical for Rust, since it affects only a small portion of an already small portion of the ecosystem.

Strict provenance and it's status in the ecosystem is closely related to the CHERI support. It's explicitly written in the OP. As I wrote in the list above, yes, we can start limited experiments by making raw casts to panic on CHERI targets, but it's not a viable long-term solution. Even if in the future we will have compilation error for raw casts on CHERI targets, without discouraging raw casts for non-CHERI developers, it still would mean that a big chunk of the ecosystem will not be accessible for CHERI targets.

Finally, if I am being honest, I care mostly about strict provenance itself, CHERI support is, well... just the cherry on top. My point is that we should discuss CHERI support on the strict provenance foundations, which means that how strict provenance is handled in the language will be automatically translated to CHERI support.

RalfJung · May 26, 2024, 3:01pm

newpavlov:

But I strongly believe that goal of the Rust language should be eventual removal of raw casts in a new edition. We may not achieve this goal for various reasons, but at the very least, the end state should be one of the heavily discouraged use. Also, as long as Rust supports old editions, raw casts technically still will be part of the language, so I do not propose "full" removal.

In the blog post you discuss the problems which raw casts cause for correctness of the language. You argue against the position "pointers are just integers" in language discussions, even if for hardware they are. You propose a way out of this mess. The documentation of with_exposed_provenance/expose_provenance discusses various complexities and vagueness which the associated "guessing" brings to the language specification table. Are those not reasons to strive to eliminate raw casts first from being used in the ecosystem, and then eventually from the language?

There's currently no realistic proposal for how to get rid of them. with_exposed_provenance has a pretty powerful semantics that can't be easily replaced by something else in all conditions; I expect it will remain a required loop-hole around pointer provenance trouble for a long time, if not forever.

The point of strict provenance is to provide an alternative to with_exposed_provenance where possible, but I don't think we can claim that with_exposed_provenance can be avoided in all cases. You are re-opening old wounds here; there was a heated discussion when the original strict provenance API was added because some people panicked thinking we'd declare all their old code with as casts wrong/UB. That is not what strict provenance is about.

I will also note that as long as with_exposed_provenance or the equivalent as casts exist in any edition, they are part of the language, with all complexities that entails. Basically nothing is gained by removing them only in some editions.

This is the wrong thread to discuss any proposal aimed at removing with_exposed_provenance from the ecosystem entirely. CHERI only requires them to be removed in the part of the ecosystem that is intended to run on CHERI.

In that case I ask you to kindly open a new thread. This thread has the goal of experimenting with CHERI as a target for Rust. Insisting on deprecation of with_exposed_provenance for all targets as part of that experiment is detrimental to that goal. Please do not try to hijack CHERI support to achieve your personal goal of removing with_exposed_provenance. This thread is already long enough without such unnecessary distractions.

No, that is incorrect. There is a big difference between ptr2int2ptr casting and funneling a pointer through a MaybeUninit. The latter perfectly preserves provenance -- @talchas is right on this point.

This is a CHERI discussion. You are the one trying to tie this to strict provenance.

I don't think this proposal came from a CHERI person, so please be careful whom you are accusing here.

RalfJung · May 26, 2024, 3:14pm

Given the >150 comments on this thread, and a title and initial comment that are vastly outdated at this point (there's currently no plan to re-define usize for existing targets, and indeed no need to do so as the old and new definitions are equivalent on those targets), I think it makes sense to write a summary of the current status and next plans (i.e., the comments around here), possibly open a new thread if anything still needs discussing, and then lock this thread before it goes even more off-topic.

I'm not sure who are the people here that want to push CHERI support forward -- @seharris you seem to be the most active, could you write such a summary?

seharris · May 28, 2024, 12:23pm

I'm worried about missing something with how long this thread is, so I'll make some time tomorrow morning to go over everything again quickly, and then try and sum things up.

seharris · May 29, 2024, 11:16am

Here's my attempt to summarise the discussion. Accurately summarising over 150 replies is hard so there could be things I've missed!

The question that started the thread was, in short, “the current definition of usize doesn't work well on CHERI, can we change it without breaking existing code?” The obvious choices for a 64 bit CHERI target are 64 or 128 bit usize, and the proposed solution was 64 bit. At this point, the answer seems to be that there are a number of unresolved concerns, and there is insufficient consensus to make language or documentation changes.

Specific concerns that were raised which I don't think we have good answers to:

Existing documentation (possibly inconsistently) says that size_of::<usize>() == size_of::<*const T>() (i.e. 128 bits on CHERI). Changing this could weaken confidence in Rust's stability. Maintaining this doesn't work out well for CHERI (degraded indexing efficiency, some code assumes usize is word-sized, some FFI assumes usize == size_t, some code may be assuming pointer as usize only returns an address).
Some code assumes size_of::<usize> == size_of::<*const T> in size or address calculations, which is hard to lint. This only actually becomes a problem when building for CHERI.
Justification for disruptive language changes is limited until CHERI hardware reaches consumers.
Changes to usize could complicate the situation for other proposed targets with complicated address semantics (w65, 8086).

Concerns that I think we have some sort of answer to:

It needs to be clear to crate developers that CHERI support is optional (there are a few options)
CHERI support should be built on strict provenance (yes it should!)

Of more recently, the idea that we could add an experimental tier three CHERI target without committing to semantic changes has been raised. I don't think there were any major objections to this, so it seems like a way forward.

To me, it look like the next thing to do is to talk to the compiler team about what problems need solving for an experimental target to happen. Is a Zulip thread or an MCP the right place to do that?

RalfJung · May 29, 2024, 2:03pm

That sounds great, thanks.

MCP sounds good, just to make sure everyone is on board. There are plenty of existing MCP for new tier 3 targets that you can follow. This one should probably have a section regarding the usize question, and point out that there's no final answer yet but part of the point of adding the target is getting some concrete experience with how things actually turn out in practice when usize and pointers have a different size -- without commiting to anything on the lang side.

notriddle · June 5, 2024, 2:04pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[Pre-RFC] usize is not size_t language design	142	14194	November 9, 2021
Should u64 implement From<usize>? libs	20	3205	December 27, 2019
ABI discussion for w65 language design	16	2309	August 20, 2021
Int2ptr and runtime provenance models Unsafe Code Guidelines	23	3088	November 14, 2021
Pointers Are Complicated II, or: We need better language specs	146	11084	January 17, 2021

Pre-RFC: `usize` semantics

Related topics