Readable unsafe is safe unsafe


#1

Rust is awesome. It’s fast and safe and unicorns and rainbows. Right until you type unsafe, which is… a less than positive experience. The problem, as I see it, is that unsafe code has been neglected under various battle cries like “unsafe should be ugly so people don’t use it”, “let’s implement inheritance first” and “meh”. I’d like to offer another perspective:

###When the only way to verify code is to read it, only readable code can be safe.

This is not a new idea, but the core philosophy of OpenBSD, and their track record speaks for itself. Those of you who haven’t peeked into their code yet, do it now, it’s a thing of beauty.

I want Rust to be the language that makes them drool. We have the potential, Rust is already viable alternative to C in a lot of areas, but we also have code like this:

*(offset(dest as *const u8, i as int) as *mut u8) = *offset(src, i as int);

With that in mind, enjoy my wishlist:

  • Introduce visual distinction between various scalar casts that deliberately lose information in different ways. This probably just needs some traits (.as_unsigned(), .truncate() etc.). Require unsafe for scalar as and point people towards the traits in the error message. Casting to same-signed larger ints should be done implicitly.

  • The basic C types must be provided by the compiler. They come from the OS, not libc: we want to implement kernels and libc itself and presumably talk to C code on both sides. All kernels define their interfaces in terms of C types. A pure Rust kernel can ignore them either way.

  • Even if the above is rejected, the C types MUST NOT, under ANY circumstances, remain just typedefs.

  • Require unsafe for anything that mentions raw pointers. The compiler cannot reason about raw pointers, therefore passing them around is unsafe too. This is a completely safe example from the docs:

// wrong (the CString will be freed, invalidating `p`)
let p = foo.to_c_str().as_ptr();
  • Get rid of the idea of a “const pointer”. This is counterintuitive, but in practice they only serve as visual clutter. Explicit casts for half the arguments of every single C call is not readable, and thus not safe. Right now, my code actually seems nicer if I just transmute all the things. A pointer is a pointer is a pointer. C’s const is a guideline at best, it has no impact on linkage or what the code will actually do, and the binding writers will be more concerned with the API they present to the rest of Rust. The main reason we might want *const T is to distinguish between “source” and “destination” pointers, but we can easily fix that with any one of: named arguments, documentation, IDE support, #[const], wrapping
copy_memory(Dst(buf), Src(input), Size(rem));

or simply by keeping C invocations uncluttered enough that the reader can actually keep track.

  • Allow unsafe code full access to private items/struct fields/whatever. Yes, it’s unsafe and unstable and rabid wolves might eat my kitten, but I already know that from typing unsafe. This would make low-level code much cleaner, as well as eliminate at least half the wonderful raw functions like
#[inline]
#[stable]
pub unsafe fn set_len(&mut self, len: uint) {
     self.len = len;
}
    
#[experimental]
pub unsafe fn from_raw_parts(length: uint, capacity: uint,
                             ptr: *mut T) -> Vec<T> {
    Vec { len: length, cap: capacity, ptr: ptr }
}

At least when I do it by hand, I won’t be surprised by the lack of asserts. All problems I could think of are easily solved by introducing an unsafe use construct and feature gating it.

Of course, this will also need all private stuff to be exported, but I expect those will come in handy once we start stabilizing the ABI and wrapping Rust libraries for other languages.


#2

Are method calls more readable than as?

Passing them around is not unsafe, there is no way to get memory unsafety from just throwing a raw pointer here and there; the code you have there is only unsafe if p is offset’d or dereference’d, it’s not unsafe by itself.

I strongly disagree with this as it is likely to make writing low level code much harder to be correct: at the moment there is at least some assistance for correctly passing a &T into C (i.e. avoiding mutating data that shouldn’t be). IMO a better approach is just allowing *mut to coerce to *const, RFC #361.


#3

Are method calls more readable than as?

When they give more information, yes. u32 as u64 is a different operation from u64 as i8, with different implications on correctness.

Passing them around is not unsafe, there is no way to get memory unsafety from just throwing a raw pointer here and there; the code you have there is only unsafe if p is offset’d or dereference’d, it’s not unsafe by itself.

If it wasn’t unsafe, it would be a reference.

as it is likely to make writing low level code much harder to be correct

Pointers have no semantics, they just point to things. There is no difference between *const T and *mut c_void because they’re one transmute away in a block where we need to do it back and forth on every single line. *const T is just a weird crippled *mut T that tries to play with the type system on a level where we’ve already explicitly thrown away the type system. It tries to force rules on us after we’ve explicitly told the compiler to let us break all the rules.


#4

Ah, I see what you are getting at. I think this is trying to use unsafe to model forms of correctness beyond memory safety. There’s no way a truncation or cross-signedness cast of integers can (in isolation) cause memory unsafe. Forcing these other behaviours under the umbrella of unsafe seems… unnecessary

The code you give

let p = foo.to_c_str().as_ptr();

is not unsafe. Executing that chunk has zero possibility of memory unsafety, the problem is trying to use p later.

There’s possibly something to be said for using &libc::c_char here (using the fact that it coerces to *const c_char) to help avoid the most common instances of CStrings being deallocated too early, but I don’t know of any other place in the stdlib that uses references in this manner. (Certainly something for us to think about, though.)

You’re not allowed to break the rules just because there is an unsafe; violating certain rules, like modifying the int behind a &int, is undefined behaviour and code can and will be “miscompiled” (it’s not really a miscompilation, since the code is fundamentally invalid, but it’s certainly not doing what the programmer wants). E.g.

use std::mem;

#[inline(never)]
fn foo(x: &int, y: &int) -> int {
    let mut z = *x;
    unsafe {
        *mem::transmute::<_, &mut int>(y) += 10;
    }
    z += *x;
    z
}

fn main() {
    let x = 1;
    println!("{}", foo(&x, &x));
    println!("{}", x);
}

prints 2 11. (The “correct” thing would be to print 12 11, since one might expect the update via the y pointer to be reflected via the x pointer.) Exactly the same thing can happen with unsafe/C functions, passing a & into a parameter that mutates it.

IMO, unsafe code is when programmers need the most assistance and should be “playing the types” as much as possible, since this is when the compiler cannot be mechanically ensuring safety (i.e. when human error kicks in). In this vein, *mut vs. *const is a very useful tool. The standard library and the libc bindings have historically been full of const-incorrectness (probably still are), where declarations were incorrectly using *const when they meant *mut. This has lead to instances of &Ts being passed to functions that mutation them: removing *mut vs. *const will just make this far worse and far harder to get right.

(I know of no specific bugs caused by that (lack of) const-correctness of the standard lib, but, as demonstrated above, we do tell LLVM that &Ts are unaliased, that is, that no modifications will be observable through that pointer, so it’s very easily possible.)


#5

Btw, I just noticed: this can/should be written as *dest.offset(i as int) = *src.offset(i as int);. The methods are polymorphic and so return their input type.

Of course, this doesn’t fix the general case, but it seems to me that most of the worst instances of that are resolved with *mut automatically coercing to *const.


#6

I don’t understand that line of reasoning. How can a dangling raw pointer possibly be safe? Can you give me an example when we would want to just return a pointer into safe code instead of a cheap safe abstraction like a tuple struct? Which is exactly what CStr is to begin with. Just no.

It does print 12 11 without optimizations. The “miscompilation” here results from the unsafe code breaking the Rust semantics without saying so. Change the arguments to *const T and it always prints 12 11.

*const T is useless because we only use it after we’ve already given up on our fantasies regarding immutable memory. The *const T of one function call will become the *mut T of the next. When we can afford to attach semantics to pointers, we call them “references” and “safe code”. It could have been useful if we didn’t already have safe, but as it is, *const T provides less utility than its cost in casting overhead.

It can catch some errors when you mix up two pointers, but you shouldn’t use pointers in the first place, and in the places where you do end up using pointers, you should know what you’re doing anyway. Make it a lint or something, not a type error.


#7

A raw pointer by itself is just a few bytes storing some number, creating such a thing is perfectly safe.

fn main() {
    let x = 1 as *const uint;
}

is a perfectly safe Rust program. Of course, such a thing is normally not useful, but the construction is not memory unsafe.

I opened #382 with the idea I mentioned above.

Yes, my point is removing *const would allow this to happen in unsafe code accidentally, in a hard-to-diagnose way, by just passing an immutable reference into a function that modifies it via a raw pointer. Saying “you should know what you’re doing” doesn’t cut it: unsafe code is hard to write correctly and throwing away an important piece of assistance seems very strange.

(Differing between optimisation levels is also surely worse than being always-broken.)

I don’t understand this at all. *const is definitely used in contexts were immutable memory makes sense, e.g. the second argument to memcpy. Using unsafe code does not mean “freely do anything you want”, it means “the compiler cannot verify this as safe, I am taking on the burden of proof”; in this context, keeping track of immutable memory and handling aliased data right is very important. Having a distinction between *const and *mut is useful for the programmer to get it right.

use std::{ptr, mem};

#[inline(never)]
fn foo(x: &int, y: &int) -> int {
    let mut z = *x;
    unsafe {
        let new = *y + 10;
        ptr::copy_memory(y as *const _ as *mut int, &new, mem::size_of::<int>());
    }
    z += *x;
    z
}

fn main() {
    let x = 1;
    println!("{}", foo(&x, &x));
    println!("{}", x);
}

has exactly the same problem as the transmute version. Imagine if it could be written as just ptr::copy_memory(y, &new, mem::size_of::<int>())? Debugging that seems hard: it could manifest as the result of some calculation being wrong occasionally, or some C call sometimes returning an error for no apparent reason (i.e. not necessarily appearing close to the code that is wrong).

If we were to only have a single pointer *, the problem above would be lessened by allowing implicit coercions from &mut to *, but disallowing coercions from & to *, but I feel like requiring explicit casts for that is likely to be worse than the status quo.

The main reason I know for casting *const to *mut is when using the freestanding offset function, which can be avoided via the method. Most other casts are taking a *mut to *const, which is (AFAICT) perfectly OK to perform implicitly, just like &mut to &.

I’m interested to know where other casts from *const to *mut occur; these should be looked on very suspiciously in general and we should try to make them necessary less often.

Rust references attach a very specific set of semantics to pointers, it’s very very reasonable to want to be attaching some other information/avoid the restrictions of references.

I don’t see how you can say “you shouldn’t use pointers in the first place”: there’s no alternative to using pointers for things like calling into C. Yes, you should know what you’re doing, but the chronic problems in C & C++ code demonstrate that it is not reasonable to expect correctness always.

I can’t think any way that this can reasonably be a lint other than via types, but am very interested to hear if you have a good proposal for how to handle the linting specifically.


#8

More about C interfacing than unsafe code per se, but the asymmetry between nullable pointers to functions and nullable pointers to objects is pretty irksome. Look at this beatifully regular C function call:

void maybe_call_callback_with_context(void (*cb)(void*), void* context);

maybe_call_callback_with_context(
    NULL,
    NULL
);

and now, shudder at the sight of its Rust equivalent:

extern {
    fn maybe_call_callback_with_context(cb: Option<extern "C" fn(*mut c_void)>,
                                        context: *mut c_void);
}

maybe_call_callback_with_context(
    None,
    std::ptr::null_mut()
);

}

The two NULLs look completely different from each other! How can anyone write code under such circumstances! Bah!

Edit: I think I forgot an unsafe there but that’s besides the point.


#9

That also ties into the old debate about making *T non-nullable. (Which I still think would be a good idea – for one thing it would allow the “null pointer optimization” for enums to extend naturally to any smart pointer types using *T (which most do), instead of requiring special effort.) The alternatives would be to instead make fn also nullable, which I don’t think anybody supports, to perhaps have separate nullable and non-nullable pointer types (overly complicated, but perhaps not terrible), and of course to do nothing, always our favorite option.

Another way forward might be to try to make fn into some kind of unsized type, as mentioned in #252, so then you would have &fn, &'static fn, *const fn, and so on, and the nullability would be in the pointer/reference type.


#10

People apparently don’t want non-nullable raw pointers, because there’s systems where there’s data at address 0. I guess there’s no systems where there’s function at address 0.

fn() doesn’t want to be nullable because then calling can possibly segfault, right? How about only making unsafe fn() nullable? 0 as unsafe extern "C" fn() still looks weird and doesn’t get you any closer to ptr::null_mut(), I suppose.

Making fns unsized like they sorta are in C where function types are distinct from function pointer types seems like an equally principled solution to making raw pointers non-nullable, but doesn’t buy you the smart pointer benefits. Wasn’t there some RFCs floating around for lifetime-having function references anyway? Does Rc<fn()> with an unsized fn() make any sense?


#11

Yeah… but the same thing holds for & references. If we have the null pointer optimization for &, it can’t point to address 0. It’s not clear why this would be OK in one case and not the other.

I was thinking more that it’s because this is a native Rust type used in normal, safe Rust code, and there it’s the expected (and right) thing for types to be non-nullable and use Option to get nullability, and it would be kind of bizarre if fn() were an exception to this. This is different for *T because those are only used in unsafe code, where there’s some demand for matching the behavior of C. (But what you write may also be true… either calling a null fn would segfault, or every call would have an imposed performance cost due to compiler-inserted null checks, and presumably neither is acceptable.)

Right. I think the two are actually orthogonal, it’s just that either alone is sufficient to resolve the discrepancy between the nullability of data and function pointers in the FFI. Either data pointers become non-nullable to match the behavior of function pointers, or function pointers are reinterpreted so that both of them use the same pointer types, in which case those pointer types may be either nullable or not, and they’ll be consistent either way. But you could also do both. Then the equivalent of C’s bool (*)(bool) would be Option<*const fn(bool) -> bool>.

Yes, this would address that as well. Instead of baking a lifetime into the existing fn type, the appropriate lifetime would be on the reference to the unsized fn type.

…and this is the hard part, i.e. how can the whole thing actually make any sense. One possibility would be that Rc<fn()> does in fact have a meaning, but there’s no way you could ever construct one, given that you can’t copy or move out of an unsized type, and there’s no way to create a value of an unsized fn type at runtime, you can only take references to the static instances created by fn declarations. (It’s not clear if JIT would change this. Maybe that’s exactly when you would want Rc<fn()>? But if you’re codegenerating multiple functions into a single memory block, that seems like it wouldn’t work any more.)

Another possibility might be to somehow interpret fn() as being a trait, except trait objects of it only have the vtable pointer, but no data pointer. But this is kind of strange in the respect that usually hidden vtable pointers are assumed to be &'static references, and the logical extension of this would be that in the case of *const fn(), you have a *const data pointer which doesn’t exist and points to nothing, and a hidden &'static vtable pointer to the fn body, but this is not what we want.

cc @Ericson2314, perhaps based on e.g. this comment you might also have some ideas?


#12

In principle you could have the “entry point” fn at the beginning of the code block and any helper functions later on, leaving the Rust-visible type as Rc<fn> or, more likely, of a struct ending with fn. Not that it sounds like a super important use case.


#13

I dont agree with the specific recommendations here, but strongly agree with this statement;

making unsafe code ‘deliberately verbose/ugly’ is not the way to go; unsafe{} is enough, IMO. jumps out, easy to search for, easy for a project to ban unsafe blocks in certain sources etc. Other than that it would be great if Rust can match C for the convenience of writing unsafe code