Relaxing the improper_ctypes lint to allow passing ZSTs behind a raw ptr


#1

Currently, rust does not like it when you pass Zero Sized Types (ZSTs) to extern "C" functions, even behind a raw pointer reference like *const ZeroSized. For example, the following code:

struct ZeroSized;

extern "C" {
    fn send(opaque_ref: *const ZeroSized);
    fn recv() -> *const ZeroSized;
}

Produces these warnings on today’s nightly:

warning: found zero-size struct in foreign module, consider adding a member to this struct, #[warn(improper_ctypes)] on by default
 --> <anon>:7:25
  |
7 |     fn send(opaque_ref: *const ZeroSized);
  |                         ^^^^^^^^^^^^^^^^

warning: found zero-size struct in foreign module, consider adding a member to this struct, #[warn(improper_ctypes)] on by default
 --> <anon>:8:18
  |
8 |     fn recv() -> *const ZeroSized;
  |                  ^^^^^^^^^^^^^^^^

It is true that zero sized structs cannot exist in C/C++ code, and thus it makes sense to prevent passing them by value. The primary use case for passing these ZSTs behind a raw pointer would be as an opaque data type. The current type of choice for that situation is

enum Impossible {}

This type has a serious disadvantage compared to a normal ZST, which is that if it is ever exposed to safe code as a &Impossible, that safe code can match against it and trigger Undefined Behavior. This means that these opaque C++ objects, if they want to be passed around within rust by reference, have to be wrapped in an inconvenient and tedious to write wrapper type:

#[derive(Copy, Clone)]
struct ImpossibleRef<'a> {
    ptr: *const Impossible,
    _marker: PhantomData<&'a Impossible>,
}

// etc...

In contrast, it would be much nicer if they could look like normal rust types, except that their exact backing layout is unknown. One way to represent that would be to instead use a zero sized type with a private ZST member. In rust this has the advantage of:

  • not triggering Undefined Behavior when passed behind a reference to safe rust code
  • making it impossible for safe rust functions like mem::swap() to ruin the data layout of the backing C++ type

But using these types in this way causes the improper_ctypes lint to complain.

I’d like to suggest relaxing the lint to allow for this use case, and make it possible to create nicer, safer, and more rust-like abstractions around opaque C/C++ style types, in less code.

(Anecdotally, while writing code which performs interop between C++ code in gecko, and rust code, I have often wanted to be able to use ZSTs like this)


#2

Zero sized structs can be created in safe rust which is unsafe if it is a backing storage for foreign data, while empty enums can’t be made in safe rust.


#3

The specific code which I would actually see would look something more like this, to prevent safe rust code from outside the module from creating one of these structs:

mod foo_mod {
    // This type is zero-sized, but cannot be constructed from outside of the
    // module foo_mod, as it has a private member.
    pub struct FooStruct(());

    extern "C" {
        fn FooStruct_BarIt(*const FooStruct) -> i32;
        fn FooStruct_BazIt(*mut FooStruct) -> i32;
        fn FooStruct_MakeIt() -> *mut FooStruct;
    }

    impl FooStruct {
        // Normally I would have an owning wrapper struct which would deref to
        // FooStruct, but for simplicity, let's leak FooStruct and get a 'static
        // reference.
        pub fn new() -> &'static mut FooStruct {
            unsafe {
                &mut *FooStruct_MakeIt()
            }
        }

        pub fn bar_it(&self) -> i32 {
            unsafe {
                FooStruct_BarIt(self)
            }
        }

        pub fn baz_it(&mut self) -> i32 {
            unsafe {
                FooStruct_BazIt(self)
            }
        }
    }
}

// From out here, code cannot construct a fake FooStruct object, but it can get,
// and pass around references to one. The FooStruct object represents an opaque
// handle to an object which exists only in C-land, and which we cannot interact
// with directly from C++, only through the methods defined in the foo_mod
// module.

#4

What you’ve shown is usually done like this:

mod foo_sys { // actually a foo_sys crate
    pub enum CFooStruct {}

    extern "C" {
        pub fn FooStruct_BarIt(foo: *const CFooStruct) -> i32;
        pub fn FooStruct_BazIt(foo: *mut CFooStruct) -> i32;
        pub fn FooStruct_MakeIt() -> *mut CFooStruct;
    }
}

mod foo_mod {
    use foo_sys::*;

    pub struct FooStruct(*mut CFooStruct);

    impl FooStruct {
        pub fn new() -> FooStruct {
            unsafe {
                FooStruct(FooStruct_MakeIt())
            }
        }

        pub fn bar_it(&self) -> i32 {
            unsafe {
                FooStruct_BarIt(self.0)
            }
        }

        pub fn baz_it(&mut self) -> i32 {
            unsafe {
                FooStruct_BazIt(self.0)
            }
        }
    }
}

CFooStruct is not exposed to safe code at all.


#5

This approach unfortunately creates a double indirection when using borrowed &FooStruct. If we want to make C interop really zero-cost, this shoud be avoided. This approach also creates a problem if you want to provide different set of methods for borrowed and owned versions of the struct (if you want to use Deref).

I had a thought that struct FooStruct(u8); may be a workaround, but it could be dereferenced, and the pointer may not be really valid pointer (eg. C library may store some addidional info in lowest bits), so I’m pretty convinced that pub struct FooStruct(()); should be an idiomatic way to define opaque types.. I’ve realized that it can’t really be dereferenced if the type is not marked Copy, so it’s not a bad workaround.


#6

I’d consider this a bug.

fn bar(f: &foo_mod::FooStruct) {
    f.bar_it();
}

fn main(){
    let f = foo_mod::FooStruct::new();
    f.bar_it();
    bar(&f);
}

Compiles to this and it doesn’t look like indirection to me:

	callq	FooStruct_MakeIt@PLT
	movq	%rax, %rbx
	movq	%rbx, %rdi
	callq	FooStruct_BarIt@PLT
	movq	%rbx, %rdi
	popq	%rbx
	jmp	FooStruct_BarIt@PLT

IME representing a borrowed version requires jumping through a bunch of unsafe hoops so the particulars of owned representation don’t matter much.


#7

Well, everything just got inlined, since every function is defined in single crate. That’s no surprise. I know that cross-crate inlining is possible (with #[inline] annotations) and argpromotion can usually help with getting rid of indirection, but that doesn’t make it zero-cost. I consider it an antipattern just as &Vec<T> is an antipattern. It’s true that in most cases the indirection will be compiled out or won’t pose a performance problem, but I find no reason why we shouldn’t make it truly zero-cost. I imagine that there are usecases where it matters, eg. if you have &[&FooStruct] on some critical path.

Yes, representation of the owned representation doesn’t really matter, but representation of the borrowed one does, and if you don’t want to store any custom metadata for the borrowed version, ability to use just &FooZST makes life easier.


#8

Um, I got carried away. The following probably is unsound because nothing’s stopping you from dereferencing &FooRef. So I guess I’m still going to wait for an “unsized” non-DST (thin pointer) type… But in the meantime the hacky CStr way is not that bad.

Huh, we might actually have all the building blocks for making a custom reference:

    pub struct FooRef(u8);
    
    impl FooRef {
        unsafe fn from_ptr<'a>(ptr: *const CFooStruct) -> &'a FooRef {
            mem::transmute(ptr)
        }
        
        fn as_ptr(&self) -> *const CFooStruct {
            unsafe { mem::transmute(self) }
        }

        pub fn bar_it(&self) -> i32 {
            unsafe {
                FooStruct_BarIt(self.as_ptr())
            }
        }
    }   

    impl Deref for FooStruct {
        type Target = FooRef;
    
        fn deref(&self) -> &FooRef {
            unsafe { FooRef::from_ptr(self.0) }
        }
    }

fn bar(f: &FooRef) {
    f.bar_it();
}

#9

If you were to do mutable references that way with the type behind the reference not being zero-sized, then you are not providing a safe abstraction. For example, suppose you have the following code:

mod frm { pub struct FooRefMut(u8); /* ... */ }
use frm::FooRefMut;
use std::mem::swap;

fn foo() {
    let a: &mut FooRefMut = /* ... */;
    let b: &mut FooRefMut = /* ... */;
    swap(a, b);
}

I have, with fully safe code, swapped the first byte of the backing FooRefMut objects. This will almost certainly break something important. If we make mem::size_of::<FooRefMut>() == 0, then we can make this safe, as swap(a, b) will be a no-op and not break anything.

I have some ideas for a RFC which will make this type of pattern easier to write which doesn’t involve (ab)using ZSTs as opaque types.


#10

How about !?

#![feature(never_type)]

#[repr(C)]
pub struct CType(!);

#11

I believe that code which holds a reference to a ! type is considered unreachable, as ! has no values, so passing around &CType could cause the optimizer to decide that any code which sees that type is unreachable. I believe that it would be safe today, but it might trigger undefined behavior in the future.