The docs for AsRef and Borrow don't say they are supposed to always return the exact same value. (Borrow is supposed return something that compares the same, but technically it could still return different values that compare equal.)
I know for safety they can never be trusted, but where safety isn't an issue, would it be appropriate to treat these methods as always returning the exact same value?
fn concat<T, V: Borrow<[T]>>(slice: &[V]) -> Vec<T> {
let size = slice.iter().map(|slice| slice.borrow().len()).sum();
let mut result = Vec::with_capacity(size);
for v in slice {
result.extend_from_slice(v.borrow())
}
result
}
to this:
fn concat<T, V: Borrow<[T]>>(slice: &[V]) -> Vec<T> {
let size = slice.iter().map(|slice| slice.borrow().len())
.try_fold(0usize, |s,l| s.checked_add(l))
.unwrap_or(!0);
let mut result = Vec::with_capacity(size);
for v in slice {
let tmp = v.borrow();
if tmp.len() <= result.capacity() - result.len() {
result.extend_from_slice(tmp)
}
}
result
}
The capacity check optimizes out vec reallocation code without compromising safety, but a weird borrow implementation could result in some items getting dropped.
I think you wanted your example to be more generic than this. For slice: &[&[T]], slice.borrow().len() is equivalent to just slice.len().
I'm not sure even this is true. The docs say "In particular Eq , Ord and Hash must be equivalent for borrowed and owned values" but I think it's meant to mean "if both types implement these traits".
I'm all for semantic requirements that make sense. That's why we have Eq and not just a Fn(T, T) -> bool, or Ord and not just Fn(T, T) -> Ordering. Or course these cannot be relies upon in unsafe code for soundness, but you should be able to rely on them in safe code for "correctness" let's say.
I'm not sure if that is the right word, but you code should be free to panic, loop infinitely, leak an arbitrary amount of memory or just return wrong results if someone gives you a trait implementation that doesn't meet these semantic constraints.
When it comes to determinism, I'm not sure I can imagine a case where Borrow would benefit from not being deterministic. On the flip side, I'm not sure your example even benefits from it being deterministic that much:
Isn't the "capacity check" just moved out of the extend_from_slice? Is the implementation of extend_from_slice really anyhing else than "check if there is space, if there is not, extend the vector. Then copy the values". If you change that "check if there is space, if there is not, do nothing, otherwise copy the values" i don't see how that is any faster if no actual reallocations happen in either case (which they wouldn't if Borrow was deterministic).
alloc
loop {
if no capacity {
calculate new size and panic if capacity overflows
realloc storage
copy data over to new storage
}
append
}
the second is:
alloc
loop {
if capacity { append }
}
The code for reallocating the vector is a bit much, especially if used in a simple case like [a, b].concat(). The version with hoisted capacity check is almost just pure memcpys, with an extra branch that has to be there for safety.
You could also make the newtype implement Eq et al, but in a way that is insensitive to the length changes, and then make borrow() return random-length slice.
This is obviously very weird and doesn't seem useful at all. But my question is: can the libstd have a policy of garbage-in-garbage-out in such case?
And should the docs for Borrow and AsRef be updated to explicitly say that their methods are expected to return exactly the same reference every time?
So the first version will jump "far away" after the if evaluates to false, whereas the second will just follow to the next instruction? I'm no expert on how pipelining or branch prediction works to be able to tell if than is faster or not, but I guess that makes sense.
Also less code means happier cache, ram and binary size, right?
It's something I was thinking about when I saw this earlier. AsRef and Borrow return naked refs that could live as long as self, so you could call them a second time before dropping the first ref. This means the only way to have it return multiple refs is for it to contain or otherwise have access to multiple separate objects, and to be picking between them using some side channel. Certainly a strange combination of (pardon the pun) traits!
Also not relevant to the trait discussion, but in the original example, concat isn't handling a possible overflow when summing the slice lengths (when compiling --release). Even on 64-bit systems it is possible with something like
let data = vec![1u8; (1 << 34) + 1]; // ~16GB of data and
let slice = vec![&data[..]; (1 << 30) + 1]; // 16GB of references to the data
slice.concat(); // for a total length over 2^64
In the original concat, this just leads to reserving too little space (the wrapped length) and then possibly reallocating a few times before eventually failing to allocate more memory, but checking for overflow early would remove one potential reason for reallocations.
In the variant that only extends up to capacity, the same oversight could result in incorrectly returning a truncated result.
Oh, yes, I wanted to discuss this too. Deref as well, BTW.
I think it's fine for safe code to assume the value is the same because anything else seems unreasonable. However I have a bunch of cases with unsafe code that needs to rely on this property and I have to use an ad-hoc trait for this.
I'd really love if core had this:
/// Marks that implementations of `Deref`, `DerefMut`, `AsRef`, `AsMut`, `Borrow`, and `BorrowMut` are deterministic.
///
/// This trait is intended to enable usage of the aforementioned traits by `unsafe` code relying on some additional properties:
/// * Each of the traits, if implemented, returns the same reference each time it is called unless the value was moved or mutated by a method *outside of these traits*.
/// * If the value was moved then the returned reference still points to the same value, it just may or may not be on a different memory address.
/// * The traits are equivalent - each returns the same reference.
/// * Calling any method *on the returned reference* does **not** change the reference returned by these traits.
///
/// In practice almost all implementations of these traits naturally do have these properties (e.g. all `std` types do), they just don't guarantee them. This marker trait signals that guarantee.
///
/// # Safety
///
/// Implementing this trait is only allowed if all of the properties above hold.
/// Implementing this trait without satisfying those properties causes undefined behavior.
pub unsafe trait DeterministicRef<T> {}
The reason I want it in core is so that different crates can communicate this without depending on each-other.
Would it be okay to submit a documentation change that specifies that AsRef and Borrow are supposed to be deterministic?
Can you help me with the exact wording for "don't do weird stuff with it, because you'll get garbage-in-garbage-out, but not UB, and unsafe code still needs to be careful about implementations that are weird?"