Next steps for the unsafe code guidelines


#10

Is it possible to join a Hangout without a Google account?


#11

never mind, let’s use this: https://meet.jit.si/rust-unsafe-code-guidelines


#12

be there in 2mins :slight_smile:


#13

because I am wildly optimistic about web tech, let’s try https://talky.io/unsafe-code-guidelines


#14

So, after numerous attempts, we wound up falling back to IRC. How shall we talk in the future? IRC is not that bad, but it’s also got some limitations.

In any case, here are some vague notes in an etherpad. We discussed the fate of the rust-memory-model repository. I think the consensus was that we should move towards a set of executable .rs tests (with a main() function) that demonstrate UB or lack thereof for various patterns. Each test would have copious comments and be organized in some sensible and TBD way.

I proposed toward the end that we should try to get started by attempting to canvas “real world” unsafe code in the wild. It seemed that there was general agreement. Therefore, I plan to try to assign to each participant (and anyone else who wants!) a crate or two to look at. There are some notes in the etherpad about what to look for, but here they are reproduced (and any additional comments would be fine):

  • Things to watch out for:
    • “escaping” – e.g. functions like fn foo(x: &u32) { .. }, where the memory at *x is somehow used after foo returns
    • aliasing between raw pointers and references – e.g., a *mut u32 and &mut u32 both pointing at same memory, used in intermingled ways
    • new capabilities that could not be modeled in safe code (e.g., mutex, rayon)
    • hidden assumptions in unsafe code (e.g., rayon assumes dtors of local variables execute)
    • type-based aliasing violations
    • “too strong” types – as in RefCell, declaring a &T when in fact the value is not always safe to use for the lifetime
    • “uninitialized memory” – how does it get used?
    • transmutes of all kinds

The idea would be to gather up notes and then meet to discuss what we found. I think this could be helpful in terms of helping us to restructure the repo too.

Thanks everyone! When should we meet again? This time slot isn’t great for me as a repeating thing (it overlaps with a regular meeting I have), but maybe we could do something somewhat earlier? Also, how frequently should we meet?


Canvas unsafe code in the wild
#15

OK, I failed to get organized this week, but here are some initial assignments:

Anyone else who wants to participate, pick a crate from issue 18 and leave a comment here.

Maybe we can meet again on March 30th with some observations? (1 week)


Canvas unsafe code in the wild
#16

I was planning on talking about Servo’s SM bindings, since that kicks up all kinds of fun.


#17

er, that’s what I meant. Sounds good.


#18

I recently proved Rc sound in my model, so it’d probably make sense for me to take that one. I think in terms of “things they do with borrows and ownership”, Rc and Arc are fairly similar anyway?


#19

There is a subtle detail about the implementation of Arc: decrementing the reference count is effectively equivalent to freeing the allocated object (since another thread might come in immediately afterwards and do just that). However, this decrement is done while holding a &AtomicUsize reference, which in theory is supposed to remain valid after the decrement.


#20

Since we are largely ignoring multiple threads to start, seems reasonable, though @Amanieu raises an interesting point.


#21

So, I wrote contradictory things here. It’s been 1 week, but it’s not March 30th. Do we think we want to try and meet today?

I think people should go ahead and leave notes on their observations in the thread for discussion. In fact, maybe I should start a new thread (just so it has a more appropriate subject line)? I think I’ll do that.


#22

I realize now that (a) I cannot do March 30th, owing to the GNOME coding sprint, and (b) in general the time we elected (2pm EST on Thursdays) is not especially good for me (I have another meeting at that time). Still, I think that synchronous meetings are maybe still of value. We certainly don’t have much of a rhythm yet. Perhaps we could establish a more regular time (earlier on Thu seemed like it would generally work).

I could probably do a meeting today but I’d love to hear from others – has anyone been able to do the research etc? (If so, I encourage you to leave a comment on this thread I spawned.


#23

Responding to your “unsafe code in the wild” thread here so as not to contaminate it:

This looks like a case of missing abstractions and I’m sure it’s not the only one. In this case, a write-only UninitializedVec<T> abstraction would probably help. Something like:

#![feature(untagged_unions)]

use std::ops::{Index, IndexMut};
use std::ptr::drop_in_place;

pub struct UninitializedVec<T: Sized> {
    data: *mut T,
    len: usize,
    capacity: usize,
}

impl<T: Sized> UninitializedVec<T> {
    // All the normal methods except that they operate over `Uninit<T>` instead of `T`...
    // And:
    pub unsafe fn into_vec(self) -> Vec<T> {
        Vec::from_raw_parts(self.data, self.len, self.capacity)
    }
}

impl<T: Sized> Index<usize> for UninitializedVec<T> {
    type Output = Uninit<T>;
    fn index(&self, idx: usize) -> &Uninit<T> {
        unsafe {
            assert!(idx < self.len, "out of bounds");
            &*(self.data.offset(idx as isize) as *const Uninit<T>)
        }
    }
}

impl<T: Sized> IndexMut<usize> for UninitializedVec<T> {
    fn index_mut(&mut self, idx: usize) -> &mut Uninit<T> {
        unsafe {
            assert!(idx < self.len, "out of bounds");
            &mut *(self.data.offset(idx as isize) as *mut Uninit<T>)
        }
    }
}

// I assume the alignment/size will match T...
#[allow(unions_with_drop_fields)]
pub union Uninit<T> {
    inited: T,
}

impl<T> Uninit<T> {
    /// Set to `value`. This will forget the existing value, if any.
    pub fn set(&mut self, value: T) {
        unsafe {
            self.inited = value;
        }
    }
    /// Set to `value` dropping the existing one.
    pub unsafe fn replace(&mut self, value: T) {
        drop_in_place(&mut self.inited);
        self.inited = value;
    }
    pub unsafe fn get_unchecked(&self) -> &T {
        &self.inited
    }
    pub unsafe fn get_mut_unchecked(&mut self) -> &mut T {
        &mut self.inited
    }
}

#24

Actually that’s pretty unclear to me. That is, the existing code…is fairly clean, it’s not obvious to me that it would add much to have an “uninitialized vector”. Put another way, that’s kind of what we wrote. =)


#25

Unless I misunderstood your post, you’re creating references to uninitialized types which may be UB (depending on the final the rules) and is definitely a bit dangerous (you’re producing safe references to types with “safe” methods that are actually unsafe to use). Isn’t that why you initially started out by using *mut T pointers everywhere? The fact that you had to make this choice leads me to believe that there’s something missing so I’m trying to propose a less “unsafe” middle ground (“unsafe” code doesn’t have to be all or nothing).

After thinking about it, you don’t actually need a special vector to deal with this; a simple Uninit<T> wrapper and a Vec<Uninit<T>> should suffice. I’ve forked off a new topic as it doesn’t really seem to belong here:


#26

Indeed. My point was merely that a more direct fix might be for us to use &mut Uninit<T> (where union Uninit<T> { value: T, uninitialized: () }), but otherwise keep working with slices as we have been doing. I guess that’s what you’re saying in your thread though. =)


#27

Sorry, meeting today will not work out for me. And I haven’t yet had the time to do my share of the survey, either… :confused:


#28

Yeah, let’s try to communciate over the thread for now. We can meet again in a few weeks – my schedule is a bit crazy for next two weeks, between the GNOME thing, some Mozilla obligations, and the ECOOP PC meeting.


split this topic #29

10 posts were split to a new topic: Overflow checks and unsafe code