"Tootsie Pop" model for unsafe code

I think the idea would be that its miri-based (the MIR interpreter in progress). Miri’s virtual heap does most of the work for us. As to libc, I’m relying on the assumption that functions we use will not malloc internally (as to allocation itself, miri special-cases it), and passing pointers to miri-allocated locations should preserve the properties of the virtual heap, provided the ffi function doesn’t do something ridiculous.

I've been wondering about this. I think the suitability hinges very much on what you call a type-system invariant.

I've been trying to write up some thoughts in response to this thread, and one of the insights I came to is that I was conflating various definitions of "abstraction boundary". For example, one definition is:

  • the scope where you find the justification for why things are safe

This lends itself to "bigger" boundaries. For example, in a function like this one:

fn foo() {
    let mut x: i32 = 22;
    let p: *mut T = &mut x;
    let q = unsafe { *p };
}

Here, the unsafe block itself is not a boundary in this "logical" sense. Rather, the entire function is, since it is only based on the function body that I can prove that p is a valid pointer for me to dereference here.

But the words you used were different. They were:

  • the scope within which the compiler cannot trust types

In that version, the code I wrote above is fine.

This to some extent comes back to the age-old debate about whether you should make "big" or "small" unsafe blocks. The "big" unsafe blocks people often argue for are more the "justification" style, and the smaller blocks correspond to "the points where I need to prove something" -- which somewhat aligns with the "where we can't trust". At minimum, it actually seems to me to act more like a traditional barrier (that is, code can't be moved around it), but one that is scoped to the enclosing "justification boundary".

Put another way, I think that the unsafe keyword was designed to act as the "justification boundary" -- but that's not the boundary that the compielr cares about when optimizing, really. I mean, we can align the two -- which is basically what my proposal was saying -- but it will result in less optimization than what we would otherwise get.

Hope that this makes sense, I'm trying to write this in a more structured way, but these are my off-the-cuff current thoughts.

Ah, this is not quite right. I think what I should say is that the compiler cares about both boundaries, potentially.

To attempt to clarify, I mean invariants such as "& and &mut references are guaranteed to be non-null and point to valid data", "a live &mut reference cannot be aliased", "the data behind a & reference won't change unless it contains an UnsafeCell", and "A reference &'a is valid for all of 'a". I believe these should always hold when outside an unsafe block. (Some have argued that these should always hold, period, but I think this can make writing correct unsafe code too subtle, and leads to questions such as "can I operate on a *mut if the &mut from which it was obtained is still live?").

I explicitly do not mean invariants of a given type such as "this raw pointer is always dereferenceable if capacity is more than zero", "the first N elements of this memory are initialized, where N is the value of the length field", "the raw pointers in this iterator point to valid nodes within the same data structure", et cetera. As you and others have noted, upholding these invariants currently requires a wider boundary relying on privacy boundaries, and furthermore seem unrelated to optimizing based on type-system invariants.

2 Likes

Actually, I think what I wrote was wrong. I probably should have said "I think the suitability depends on how you view the role of an unsafe block" or something like that.

I wrote a part of a memory allocator, and found myself struggling with aliasing issues and indeterminate lifetimes. This forced me to use a bunch of raw pointers.

If you’re trying to get memory from the operating system yourself, and thus are also responsible for giving it back, I don’t see why that’s a problem.

My largest issue with unsafe rust is that there isn’t a clear set of semantics for operations which require unsafe blocks. Trying to define a model for a boundary between safe and unsafe code for optimization purposes doesn’t make sense without understanding the semantic guarantees of the operations on one side of the boundary, does it? For example, the C++ standard doesn’t define ‘boundaries’ where the behavior of the optimizer changes, it just defines the guaranteed semantics of operations given the state of the program when the operation takes place.

At a higher level, I’m not sure it’s reasonable to try to make unsafe code easy to write at the cost of performance. If someone types the unsafe keyword, they should ‘know’ what they are doing or accept the consequences. The goal of the rust team should be to provide the knowledge and guarantees required to write code that works as expected. If someone wants training wheels, that’s what safe code is for.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.