Can we pass `Copy` values by immutable reference?

Is a known lack of mutation sufficient to overlap allocated objects which are known to be structurally equal for the entire time both are live?

Yeah, I appreciate and sympathize with the goal here, but unfortunately I don't know how to write an opsem that supports these semantics. I would also like to be able to say that addresses don't matter for silly reasons, but do when they're supposed to, but I don't know how to formalize that.

It really would be unfortunate if we can't make Rust more stack efficient by default with Copy values because we need to do a bunch of defensive copies

I'm not at all convinced that we need changes to our semantics in order to get a pretty good outcome here. There is a lot of information LLVM doesn't have (and lots of known cases of LLVM not doing enough with the information it does have).

Arc's eq is not affected by this. It doesn't compare its own address, it only compares address of the heap allocation it owns. That heap allocation includes the reference count, so it's not empty, and has to be unique.

Note that even "heap" allocations don't have unique addresses in Rust thanks to ZSTs:

fn main() {
    let a = Box::new(());
    let b = Box::new(());
    println!("{:?} {:?}", Box::into_raw(a), Box::into_raw(b))
    /// prints 0x1 0x1
}
5 Likes

I'm not sure that's exactly the case. Functions take up address space, so if it's a problem that three malloc(isize::MAX)s can "succeed", then three functions needing that many bytes of instructions succeeding seems like an equivalent issue.

We took it out because it actually made things worse -- it made LLVM optimize less because it made it look like the address mattered, and didn't actually make things faster in practice because comparing exactly the same slice is pretty rare. If people want that, they're usually doing proper interning in the first place and not comparing the values.

2 Likes

That heap allocation includes the reference count, so it's not empty, and has to be unique.

I'm a little bit confused. You seem to be agreeing here that simultaneously live allocations are non-overlapping. But if that's the case then that's enough to validate my point in my original post, no?

Functions take up address space

Probably not as far as any Rust code can tell (if you have some example showing otherwise, please let me know).

We took it out because it actually made things worse -- it made LLVM optimize less because it made it look like the address mattered, and didn't actually make things faster in practice because comparing exactly the same slice is pretty rare

Yeah, that makes sense. In any case, the fact that std was doing this makes it pretty clear imo that people do expect there to be some amount of uniqueness guarantees when it comes to addresses

@JakobDegen Why in the OP example the lifetime of the two bigs are overlapping? To me it looks like the first ends before the call and the second starts with the call. How can thing like Arc's specialization hurt here?

No, I don't think this is relevant. In the case of Arc's heap the data is mutable (via atomic), and optimization discussed here is about immutable data.

Second, the heap allocations (except ZSTs) belong to an allocator, and the allocator interface lacks ability to express such optimization. However, the stack and the ABI is entirely under compiler's control.

(edit: my original explanation was incorrectly about non-Copy types).

Aside from that, in case of Arc::eq, I was trying to explain you've confused the address of Arc instances themselves (the cloneable smart pointers that may be on the stack) with the address of Arc's heap payload (the shared refcount and data storage). These are different pointers, and the one that Arc compares is not the one affected by the ABI and moves.

Quirks of Deref make it hard to explain this using Rust syntax, so here's how Rust's Arc looks in C:

struct Arc {
    struct ArcInner *ptr;
};
struct ArcInner {
    size_t strong, weak;
    char data[]; // stored inline, not a pointer
};

bool Arc_eq(a: *Arc, b: *Arc) {
    return a->ptr == b->ptr; // not a == b
}

struct Arc arc1 = { malloc(…) };
struct Arc arc2 = arc1;
Arc_eq(&arc1, &arc2); // address of arc1 or arc2 doesn't matter.
1 Like

The point is that the example under discussion isn't a move, it's a copy. You can still get the address of a, because the copy doesn't invalidate the source place, and if a reference of a ever escaped, its validity is allowed to continue until a is used by-mut (possible) or by-move (impossible), so we can't turn the last copy into a move without violating AM semantics.

The refinement here is that it could potentially make sense to treat heap allocation and stack allocation differently, guaranteeing that heap allocated objects always have disjoint addresses, but not guaranteeing that for stack allocated objects.

This would mean ptr_eq works to compare allocation identity for any heap allocated objects (e.g. Arc::ptr_eq) but not for potentially stack allocated objects (e.g. <[_]>::eq, where it could also return true for distinct allocated objects which are structurally equivalent and thus (potentially) allowed to overlap.

This is weakly related to “target guaranteed behavior” and “unrealized spec nondeterminism;” if the spec allows some nondeterminism (e.g. overlapping addresses allocation), it should ideally be relatively simple to realize that nondeterminism, otherwise it de facto doesn't exist and people will (potentially accidentally) rely on it not being introduced.

Heap move elision is typically done via manual specialization (e.g. vec.into_iter().collect()) and is difficult to do automatically (because of side effect ordering in the face of divergence, AIUI), but there's no way to do this for stack moves (other than writing code using references instead). This makes realizing overlapping heap allocated objects much rarer than for stack allocated objects, potentially justifying treating them differently.

An even weaker version (though more difficult to specify) would to be allow overlapping objects only for T: Copy in the source[1], but guarantee for ?Copy generics that the address is properly disjoint from any other allocated object.

Implementation-wise, the Rust ABI would have an invisible flag between two kinds of by-reference passing — callee copy/clobbered (today's semantics) and caller copy/maintained, with the latter used only for any by-value arguments which are allowed to have overlapping address allocation.


The specification question can be summarized as is the following snippet allowed to evaluate to true? Optimizing other examples unifies with this example in the face of analysis-escaping references[2].

type BigCopy = [u8; LARGE_ENUF];

fn do_args_overlap(a: BigCopy, b: BigCopy) -> bool {
    ptr::eq(&a, &b)
}

let data = BigCopy::default();
do_args_overlap(data, data)

Alternatively, the optimization could be made possible by making escaping references not cause simpler examples to unify to do_args_overlap, by declaring by-copy stack-place usage to invalidate extant pointers, i.e. make the following UB:

let place = 0_i32;
let p = ptr::addr_of!(place);
{place};
&*p;

This seems highly unintuitive. The “don't invalidate pointers” copy could still be written as *&place... which also suggests the “do invalidate pointers” copy could be written today as *&mut place.


  1. Doing so for post-monomorohization T: Copy is easier to specify but not particularly helpful, since given some for<T> it might still be potentially overlapping, because it might be Copy. ↩︎

  2. e.g. because the extern calls stash pointers and then do this comparison to observe the overlap, and the pointers are still valid to dereference because by-copy place access doesn't invalidate pointers to the place like by-mut or by-move place access does. ↩︎

4 Likes

The stipulation, however, is that though the value is Copy, it doesn't need to be copied in specific cases. For example, when the value is immediately dropped following the call to g in the OP, or the call to callee in many of the following examples. In this scenario, the copy acts like a move.

... Unless I'm way off base and this hypothesis is demonstrably untrue?

I recognize that the analysis required to determine there is a drop is nontrivial. I don't even know if the problem is intractable from an implementation perspective. It's just that I'm unaware of anything that makes this kind of optimization absolutely impossible, not even for very narrowly applicable situations.

It's also possible I'm just barking up the wrong tree! Is this discussion limited to external ABI requirements? When the compiler can't guarantee what both sides of the function call do, that's fine, no other optimizations are possible. Or is this "what happens with big Copy types when passing by value in the general case sans FFI?" Then I think it would be interesting to fully understand why some are saying this copy elision optimization is not possible (or practical).


Also, FWIW, do_args_overlap(data, data) does not appear to fit the shape of the code in the OP. It clearly performs a copy at the call site. With the second argument removed (or replaced by something else) then it would feel like a better fit.

1 Like

The reason it's not generally possible to replace a copy with a move is because copies don't invalidate pointers. The do_args_overlap just makes this immediately visible.

If the compiler can see the whole world, it can do inlining and trivially improve things. We're discussing what happens when inlining doesn't happen.

So consider the compilation unit (a lot of unsafe elided for brevity)

extern "Rust" {
    fn by_ref(x: &BigCopy);
    fn by_copy(x: BigCopy);
    fn query() -> bool;
}

fn main() {
    let data = BigCopy::default();
    by_ref(&data);
    by_copy(data);
    dbg!(query());
}

by_copy passes the BigCopy ABI-by-reference. Currently this is a reference to a temporary copy. We'd like to say that since this is the last use of data, it can be treated like a destructive move and doesn't need to make a defensive copy with a unique address, right?

Well it can't be a destructive move, because the other side is

static EVIL: Mutex<State> = Mutex::default();
struct State {
    first: *const BigCopy,
    eq: bool,
}

fn by_ref(x: &BigCopy) {
    EVIL.lock().unwrap().first = x;
}

fn by_copy(x: BigCopy) {
    let state = EVIL.lock().unwrap();
    state.eq = ptr::eq(state.first, &x);
}

fn query() -> bool {
    let state = EVIL.lock().unwrap();
    dbg!(&*state.first);
    state.eq
}

No operation has invalidated that first pointer given to fn by_ref — we only use the place by-ref (by-copy is by-ref). As that pointer has not been invalidated, it's still valid to access, so the data place is valid and can't be optimized out until the place is at least used by-mut (or by-move, but that's impossible for Copy types), invalidating the original escaped reference.

Even more fun: I could've started up a thread that's constantly reading the reference after calling by_ref that keeps going until query, and there's still no UB to say you can ignore this possibility.

The key thing to note is that println! has been used in a few of the examples as the "complicated enough" place to stick a reference that we have to consider it escaped and in use until the dynamic borrowing rules invalidate that reference's provenance.

But, even if we know we've invalidated all references, there's still a potential problem with treating it as a destructive move and reusing the space: the value is still logically live, so if it's address has ever been taken, putting another object there means I can observe that object overlapping a live object (the variable which I previously took the address of but aren't using anymore).

There are absolutely more opportunities for optimizing out redundant copies without changing any semantics. But for the more general question of changing the ABI to eliminate copies, it's not so simple, because in general, we need to consider the worst case. If we aren't in the worst know-nothing case, inlining and targeted copy elision gets most of the benefit.

2 Likes

I feel like there’s a reasonable analogy here with arguments passed in registers. Formally, you can get a reference to every Rust value; formally, that reference has the same representation as a pointer, which means even a by-value u8 argument “has an address” in Rust. But we’re okay with the compiler never actually putting that argument into memory if its address is never used. If you ever try to check if the address overlaps with anything else, well, it won’t, but that act of observation has changed the program.

So it seems to me to not break any rules to say that formally there is always a copy of an indirect argument to a local allocation, but that the compiler is allowed to eliminate that copy if the address is never observed. (EDIT: to be clear, this is a possible ABI change, not a description of the current ABI.)

Now, that’s not necessarily very satisfactory without a lot of inlining, since pass-by-reference is sort of the default non-mutating convention in Rust (especially when the Copy type is an array but the use is as a slice). But I think that it’s not controversial, right?

7 Likes

Distinguishing between stack and heap here honestly seems kind of ugly. First of all, there's no reason to believe that this is what users expect - keep in mind that the standard library used to have code which would be wrong under these semantics. But also, this makes it even harder to turn heap allocations into stack allocations for example.

Are you referring to slice's eq using pointer comparison as an optimization? It would still be a valid implementation, because the optimization proposed here is for immutable data only, so the two "copies" would remain equal for as long as they share an address.

3 Likes

It's extremely unclear to me if "are these two referenced places distinct allocated objects" is ever a semantically meaningful question to ask. Obviously it's a valid question to ask (assuming allocated object addresses are disjoint), but I feel that it's always a proxy for a more semantically meaningful derived property.

There's two major properties which I'm relying on to make the argument overlapping objects is potentially acceptable:

  • ptr::eq already doesn't allow you to substitute one pointer for the other, because of provenance.
  • Overlapping is only potentially allowed if both objects are structurally equivalent and have the exact same abstract contents (including provenance shadow state) for the entire lifetime of the shorter-lived object.

Allowing objects to overlap in this way means that ptr::eq answers a dual question along the lines of "can accesses through one pointer effect the provenance validity of the other" plus "in the case provenance stays valid, can writes through one pointer be seen through the other" (equiv. "while they have valid provenance, do reads through either pointer retrieve the same data").

But of course, allowing the overlap means let a = 0; let b = 0; ptr::eq(&a, &b) can be true. I cannot think of any halfway-reasonable unsafe that relies on this being false, but opsem isn't really concerned with "reasonable," and it certainly is surprising to allow overlap, compared to C opsem. (On the other hand, the &'static version absolutely is allowed to and does in practice alias.)

6 Likes

I actually remembered a somewhat standard use just now — taking the address of a local to semiuniquely[1] identify distinct call sites. If identical objects are allowed to overlap, this doesn't function to distinguish the locals anymore.

Along that line, escaped (noninvalidated) pointers to locals inhibit TCO as well, since the (pointers to the) locals mustn't be invalidated until after the tail function call. So while it won't help for non-tail calls, an explicit become tail call which ends locals' liveness before the tail function call should be able to help with optimization here as well (and serve as additional motivation to providing become), thus allowing the trivial OP example to more easily (if it uses become) optimize codegen as just a jmp (or even just a symbol alias).

I still do think allowing byte-identical objects to overlap (if they're guaranteed byte-identical for the entire liveness overlap) is a viable opsem. However, despite playing devil's advocate and arguing for it, I remain (and always have been) unconvinced that it's a desirable semantic. Though there are concrete optimizations it justifies, it's certainly a surprising bit of opsem, and much of the motivating benefit can be justified separately, though at a higher proof burden.

We should probably consider "blessing" *&mut place via MIR optimization as a way to get a destructive final use of Copy-typed places as an optimization tool[2], though. Or perhaps a surface-level move(place) would be clearer and could enforce that it is in fact a final use. become is a heavy hammer and only works for optimizing tail calls.


  1. Of course it's only unique within a single call stack; once you end the liveness of the object you took the address of previously, its address is freely available for latter objects again. ↩︎

  2. Specifically only intended as an optimization tool to use if you notice lack of copy elision causing a problem, though I'm sure some people will end up using it just in case. I'd almost even recommend a default-warn (clippy) lint against it, as it's by definition semantically unnecessary and exclusively serves to make some things we don't want the compiler to have to worry about UB. ↩︎

2 Likes

There is a pattern of using pointers as arbitrary unique IDs (not specifically for call sites). Most of the time, there is some sort of meaningful data being pointed to, which would usually be different for different allocations, and address uniqueness is just a bonus - e.g. in the case of string interning. But it’s also possible for the pointer to to point to nothing meaningful. For example, a custom Condvar implementation might not store any data in the Condvar itself, instead using a global map from ‘pointer to Condvar’ to ‘list of threads waiting for the Condvar’. This way the Condvar only needs to be 1 byte large (and only takes up space in the map while some thread is actually waiting for it).

6 Likes

This could in theory still be permitted by the Condvar holding UnsafeCell<u8> and restricting the overlap allowance further from "known to always be structurally identical" to also "is shared and Freeze"... but now the case where overlap is allowed is so restricted that the benefit almost certainly cannot outweigh the surprise cost of when it does happen.

(Because I initially forgot about it: you can't use the address of the mutex, because it's perfectly allowed to have multiple Condvar on the same mutex for different condition signals.)

The key reason this defeats any scheme for allowing overlap is that Condvar very specifically doesn't care at all about holding state itself (it could just be a byte of padding if that were expressible in Rust); all it wants is a globally unique address from other Condvar while being waited on. Technically violating that would just result in performance degradation (spurious wakeups are allowed), but that's still highly undesirable.

If we really wanted to preserve a lack of unique address guarantee, the solution would be to invert C++'s [[no_unique_address]] and provide a #[unique_address] attribute for types to opt-in to a guarantee of unique address (opt-out of overlapping optimizations).

(I start to somewhat understand the C and C++ choice to just say comparing pointers to separate allocated objects is Undefined Behavior. Comparing such pointers is a form of ptr2int and every use of ptr2int seems to be some degree of cursed, even this mitigated-by-allocation-plane-provenance-ordering one (since only one allocation plane exists at runtime).)

But stating it again to be explicit, that makes the cost of enabling "observable overlap" optimizations to much to be justified. Especially since it's unclear how much of the benefit can't be achieved just by smarter optimizations for unobservable overlap.

For static objects we could potentially apply the "structurally equal" rule even if it doesn't apply to other allocated objects... but that would break static Condvar, so probably can't happen. If people want global deduplication they'll probably need to use const instead and tolerate duplication as well.

It's certainly unfortunate to leave optimization potential on the table, but it's significantly more important to not break peoples' code. Perhaps if experimentation shows it to be beneficial, we could require the use of #[unique_address] in future editions for Condvar-like types (applying it to all types in edition migration).

.... Interestingly enough, address overlap may be tangentially related to the use of near/far pointers as well (since near pointers with different regions would compare equal), so cc @InfernoDeity to ask if you've thought any about the implications of allowing standard pointers to distinct allocated objects to compare equal. (IIRC you said lccc has a default-near-ptr Rust mode where *mut T is *near mut T instead of *far mut T. Although I suppose, as an explicitly nonstandard flag, default-near-ptr mode can just change the overlapping address rules from the rules from default-far-ptr mode.)

1 Like

It depends on the ABI, which would be part of the target. For 8086 though, lccc will support both flat and segmented, with near and far pointers being a choice. In the former case, though, you won't have multiple data-type segments, but the cs may be different (if using near fn-ptr as well as near data-ptr). Technically, ss may contain a different base from ds, but they'd ultimately be overlapping, since obviously you need to be able to grab a pointer to ss:[bp+n]. Actual (data) pointer values to real allocations would be relative to ds, though, so they wont compare equal when non-overlapping. They may compare equal to fn-ptrs under this model, though.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.