Are AsRef and Borrow supposed to be deterministic?

kornel · November 25, 2022, 12:05am

The docs for AsRef and Borrow don't say they are supposed to always return the exact same value. (Borrow is supposed return something that compares the same, but technically it could still return different values that compare equal.)

I know for safety they can never be trusted, but where safety isn't an issue, would it be appropriate to treat these methods as always returning the exact same value?

For example, could slice::concat be changed from this:

fn concat<T, V: Borrow<[T]>>(slice: &[V]) -> Vec<T> {
    let size = slice.iter().map(|slice| slice.borrow().len()).sum();
    let mut result = Vec::with_capacity(size);
    for v in slice {
        result.extend_from_slice(v.borrow())
    }
    result
}

to this:

fn concat<T, V: Borrow<[T]>>(slice: &[V]) -> Vec<T> {
    let size = slice.iter().map(|slice| slice.borrow().len())
        .try_fold(0usize, |s,l| s.checked_add(l))
        .unwrap_or(!0);
    let mut result = Vec::with_capacity(size);
    for v in slice {
        let tmp = v.borrow();
        if tmp.len() <= result.capacity() - result.len() {        
            result.extend_from_slice(tmp)
        }
    }
    result
}

The capacity check optimizes out vec reallocation code without compromising safety, but a weird borrow implementation could result in some items getting dropped.

tczajka · November 25, 2022, 8:23am

I think you wanted your example to be more generic than this. For slice: &[&[T]], slice.borrow().len() is equivalent to just slice.len().

I'm not sure even this is true. The docs say "In particular Eq , Ord and Hash must be equivalent for borrowed and owned values" but I think it's meant to mean "if both types implement these traits".

kajacx · November 25, 2022, 9:38am

I'm all for semantic requirements that make sense. That's why we have Eq and not just a Fn(T, T) -> bool, or Ord and not just Fn(T, T) -> Ordering. Or course these cannot be relies upon in unsafe code for soundness, but you should be able to rely on them in safe code for "correctness" let's say.

I'm not sure if that is the right word, but you code should be free to panic, loop infinitely, leak an arbitrary amount of memory or just return wrong results if someone gives you a trait implementation that doesn't meet these semantic constraints.

When it comes to determinism, I'm not sure I can imagine a case where Borrow would benefit from not being deterministic. On the flip side, I'm not sure your example even benefits from it being deterministic that much:

Isn't the "capacity check" just moved out of the extend_from_slice? Is the implementation of extend_from_slice really anyhing else than "check if there is space, if there is not, extend the vector. Then copy the values". If you change that "check if there is space, if there is not, do nothing, otherwise copy the values" i don't see how that is any faster if no actual reallocations happen in either case (which they wouldn't if Borrow was deterministic).

kornel · November 25, 2022, 10:53am

When I wrote "compare equal" I specifically meant it as a short for "Eq , Ord and Hash must be equivalent for borrowed and owned values".

Yes, the example could be simplified, but the original is doomed to use Borrow for back-compat:

kornel · November 25, 2022, 10:57am

The difference is in code bloat.

The first version is:

alloc
loop {
    if no capacity { 
       calculate new size and panic if capacity overflows
       realloc storage
       copy data over to new storage 
    }
    append
}

the second is:

alloc
loop {
    if capacity { append }
}

The code for reallocating the vector is a bit much, especially if used in a simple case like [a, b].concat(). The version with hoisted capacity check is almost just pure memcpys, with an extra branch that has to be there for safety.

quaternic · November 25, 2022, 11:03am

Indeed, and that interpretation is then insufficient for the assumptions in the OP because you can have

struct NoEq { ... }
struct WeirdBorrow { ... }
impl Borrow<[NoEq]> for WeirdBorrow {
    // borrow that randomly returns
    // different length slices every call
}

and that's "OK" because [NoEq] doesn't implement Eq when NoEq doesn't.

kornel · November 25, 2022, 11:08am

You could also make the newtype implement Eq et al, but in a way that is insensitive to the length changes, and then make borrow() return random-length slice.

This is obviously very weird and doesn't seem useful at all. But my question is: can the libstd have a policy of garbage-in-garbage-out in such case?

And should the docs for Borrow and AsRef be updated to explicitly say that their methods are expected to return exactly the same reference every time?

tczajka · November 25, 2022, 11:10am

You can also have:

struct NoEq { ... }

impl Borrow<i32> for NoEq

that returns a reference to a different i32 every time you call borrow, and this doesn't violate the rules because NoEq doesn't implement Eq.

quaternic · November 25, 2022, 11:22am

Eq has to be reflexive, so with T: Eq and x: T, x == x, but you can't make two slices equal if their lengths aren't.

kornel · November 25, 2022, 11:33am

Ok, but what about AsRef?

kajacx · November 25, 2022, 12:20pm

kornel:

Summary

The first version is:

alloc
loop {
    if no capacity { 
       calculate new size and panic if capacity overflows
       realloc storage
       copy data over to new storage 
    }
    append
}

the second is:

alloc
loop {
    if capacity { append }
}

So the first version will jump "far away" after the if evaluates to false, whereas the second will just follow to the next instruction? I'm no expert on how pipelining or branch prediction works to be able to tell if than is faster or not, but I guess that makes sense.

Also less code means happier cache, ram and binary size, right?

kornel · November 26, 2022, 1:51am

Yes, smaller binary size is generally better. The needless realloc code makes concat() not a zero-cost abstraction.

kornel · November 30, 2022, 2:49pm

Is there a valid use-case for a non-deterministic AsRef or Borrow?

Could it be some lazy-asynchronous-init container? Something multi-threaded where another thread could change what is dereferenced?

NoamB · November 30, 2022, 4:30pm

Maybe an RCU implementation?

bestouff · December 1, 2022, 7:51am

Borrow semantics apart, it looks like you'd need a Vec::extend_within_capacity() method ...

simonbuchan · December 1, 2022, 9:35am

It's something I was thinking about when I saw this earlier. AsRef and Borrow return naked refs that could live as long as self, so you could call them a second time before dropping the first ref. This means the only way to have it return multiple refs is for it to contain or otherwise have access to multiple separate objects, and to be picking between them using some side channel. Certainly a strange combination of (pardon the pun) traits!

quaternic · December 1, 2022, 5:24pm

kornel:

For example, could slice::concat be changed from this :

fn concat<T, V: Borrow<[T]>>(slice: &[V]) -> Vec<T> {
    let size = slice.iter().map(|slice| slice.borrow().len()).sum();
    let mut result = Vec::with_capacity(size);
    for v in slice {
        result.extend_from_slice(v.borrow())
    }
    result
}

Also not relevant to the trait discussion, but in the original example, concat isn't handling a possible overflow when summing the slice lengths (when compiling --release). Even on 64-bit systems it is possible with something like

let data = vec![1u8; (1 << 34) + 1];        // ~16GB of data and
let slice = vec![&data[..]; (1 << 30) + 1]; // 16GB of references to the data
slice.concat();                             // for a total length over 2^64

In the original concat, this just leads to reserving too little space (the wrapped length) and then possibly reallocating a few times before eventually failing to allocate more memory, but checking for overflow early would remove one potential reason for reallocations.

In the variant that only extends up to capacity, the same oversight could result in incorrectly returning a truncated result.

Kixunil · December 2, 2022, 10:45pm

Oh, yes, I wanted to discuss this too. Deref as well, BTW.

I think it's fine for safe code to assume the value is the same because anything else seems unreasonable. However I have a bunch of cases with unsafe code that needs to rely on this property and I have to use an ad-hoc trait for this.

I'd really love if core had this:

/// Marks that implementations of `Deref`, `DerefMut`, `AsRef`, `AsMut`, `Borrow`, and `BorrowMut` are deterministic.
///
/// This trait is intended to enable usage of the aforementioned traits by `unsafe` code relying on some additional properties:
/// * Each of the traits, if implemented, returns the same reference each time it is called unless the value was moved or mutated by a method *outside of these traits*.
/// * If the value was moved then the returned reference still points to the same value, it just may or may not be on a different memory address.
/// * The traits are equivalent - each returns the same reference.
/// * Calling any method *on the returned reference* does **not** change the reference returned by these traits.
///
/// In practice almost all implementations of these traits naturally do have these properties (e.g. all `std` types do), they just don't guarantee them. This marker trait signals that guarantee.
///
/// # Safety
///
/// Implementing this trait is only allowed if all of the properties above hold.
/// Implementing this trait without satisfying those properties causes undefined behavior.
pub unsafe trait DeterministicRef<T> {}

The reason I want it in core is so that different crates can communicate this without depending on each-other.

kornel · December 7, 2022, 3:43pm

Would it be okay to submit a documentation change that specifies that AsRef and Borrow are supposed to be deterministic?

Can you help me with the exact wording for "don't do weird stuff with it, because you'll get garbage-in-garbage-out, but not UB, and unsafe code still needs to be careful about implementations that are weird?"

Kixunil · December 8, 2022, 11:29am

Who are you asking?

Topic		Replies	Views
Semantics of AsRef language design	18	5091	December 3, 2022
Replace Borrow and BorrowMut traits with AsRef and AsMut	8	2177	March 25, 2019
Faster PartialEq for small arrays of non 2^n length compiler	25	749	September 14, 2024
[Discussion] Hybrid borrow language design	15	775	October 4, 2024
Ergonomics initiative discussion: Allowing owned values where references are expected language design	29	4448	March 25, 2019

Are AsRef and Borrow supposed to be deterministic?

Related topics