Mutating the contents of `Vec::as_ptr()`

Vec::as_ptr() contains the following nicely worded warning:

The caller must also ensure that the memory the pointer (non-transitively) points to is never written to (except inside an UnsafeCell) using this pointer or any pointer derived from it. If you need to mutate the contents of the slice, use as_mut_ptr.

So I'm asking, am I really not allowed to write into it?

Of course there is an invariant here. The question is, is it a safety invariant or a validity invariant?

If it is a validity invariant (aka language UB), writing to the pointer is never safe, under any circumstances.

However, if this is a safety invariant, things get more complicated.

The standard library reserves the right to escalate a safety invariant to a validity invariant. I'm asking whether it can do that in this case too.

In other words, if I write into the pointer, but I always restore back the original value(s), and I promise to never call any Vec methods before I do that - do I still have UB?

The documentation also says:

This method guarantees that for the purpose of the aliasing model, this method does not materialize a reference to the underlying slice, and thus the returned pointer will remain valid when mixed with other calls to as_ptr and as_mut_ptr. Note that calling other methods that materialize mutable references to the slice, or mutable references to specific elements you are planning on accessing through this pointer, as well as writing to those elements, may still invalidate this pointer. See the second example below for how this guarantee can be used.

Which makes me think that this cannot be a validity invariant, but I'm not sure if this is guaranteed.


Context: I'm implementing a pycall!() macro to conveniently call Python functions in the context of PyO3. This macro has the ability to unpack an iterable (or multiple iterables) into a call, and act as if all of the iterable's items were passed one-by-one (similar to unpacking in Python calls). I specialize unpacking for single slice/vec to pass the arguments directly without doing any conversion.

Now, Python has the nice flag called PY_VECTORCALL_ARGUMENTS_OFFSET, that if added to a call, means that the interpreter is allowed to modify (but promises to restore) args[-1], or args[0] in the case of a method call. This can help with the performance of bound method.

Now, in case of a function, I obviously cannot allow Python to modify args[-1], since I have no idea what's in there. But in case of a method, args[0] is under my control. If I am passed a mutable reference (to a slice or a Vec), everything is fine. If I am passed a shared reference to a slice, I cannot modify it, that will be language UB. But the question is, what happens if I am passed a shared reference of a Vec?

This is exactly this question. Am I allowed to modify the pointer resulted from Vec::as_ptr()?

By your terminology, validity. You cannot mutate something that was obtained behind an immutable reference. The only exception to this is UnsafeCell, which is not used here. This is a language-level restriction.

This is basically a fancy way of saying "the vec may reallocate, rendering the previously acquired pointer dangling".

1 Like

It's a validity invariant. A safety invariant is not instant UB, writing through vec.as_ptr() is instant UB.

1 Like

That would be true if I would write to the data pointer, length or capacity, since they are contained inside a shared reference. But by language rules alone, since the data is behind a raw pointer, and as_ptr() doesn't materialize a reference to it (as the docs say), it cannot be language level UB. In other words, the compiler cannot detect that I'm writing through a pointer I got from as_ptr() and turn the whole thing into unreachable_unchecked(). That would be an incorrect optimization. Or, put it differently, if I would create my own Vec, and copy the contents of as_ptr() one-by-one, I would be free to declare this is not UB.

The problem is that the standard library may reserve the right to turn this into a validity invariant at any point in the future. And here comes my question: does it, actually? Or maybe because we guarantee it won't materialize a reference to the data, it cannot?

Since you are not the library author, it really doesn't matter whether it is library UB or language UB. If you do such a write, you are breaking your contract with the Vec author, and your code is broken.

1 Like

But if I understand it correctly, the whole idea of distinguishing between safety and validity invariants is that safety invariants can be temporarily broken (that was why str being UTF-8 was changed from validity to safety). Of course, the standard library is free to use that. But am I too? I understand that we wouldn't want to declare every invariant in every std type whether it's safety or validity and what methods break it; However, Vec is so fundamental, and as_ptr() is so simple, that it may be worth it.

Also, there is the question of whether the standard library can escalate this into language UB at all, given the docs guarantee it won't materialize a reference, and I don't know a way to mark the pointer read-only without this having effects for the aliasing model too, which the method is not allowed to do. And if we can never escalate the UB, what does it mean for it to be possibly-language UB? And is there actually a value in that?

Safety invariants can be temporarily broken by the module that defines these invariants. But not by anyone outside that module, except if that is explicitly permitted by the documentation.

It is easy to cause UB by writing through the as_ptr pointer even if it does not materialize a reference, e.g. if two threads do this concurrently. So to permit code like this we would need a doc change that spells out the conditions under which this is permitted. But permitting such mutation does entirely against the intended value model of Vec so I don't think we should do that.

4 Likes

Without criticism, I’m also curious of the scenario where you have a Vec where you can promise mutation is okay because it will be put back before returning to Rust (thus ensuring no re-entrant or parallel use) but can’t use as_mut_ptr in the first place.

5 Likes

Oh that's right! I didn't think about reentrancy :hushed:

The question is interesting regardless IMHO.