Field aliases: alternative idea for anonymous unions for FFI


#1

The “anonymous struct and union types” pre-RFC doesn’t seem to be going anywhere, and we really badly need something to address the problem of C system headers defining structs whose documented fields are actually preprocessor macros referring to members of a nested union. I came up with an alternative idea in the shower this morning, please let me know what you think.

To motivate the problem, here is probably the simplest case found in real headers: struct sigaction as defined by OSX sys/signal.h.

/* union for signal handlers */
union __sigaction_u {
        void    (*__sa_handler)(int);
        void    (*__sa_sigaction)(int, struct __siginfo *,
                       void *);
};

/*
 * Signal vector "template" used in sigaction call.
 */
struct  sigaction {
        union __sigaction_u __sigaction_u;  /* signal handler */
        sigset_t sa_mask;               /* signal mask to apply */
        int     sa_flags;               /* see signal options below */
};

/* if SA_SIGINFO is set, sa_sigaction is to be used instead of sa_handler. */
#define sa_handler      __sigaction_u.__sa_handler
#define sa_sigaction    __sigaction_u.__sa_sigaction

They get much messier than that; just look at siginfo_t from the same header (but on FreeBSD or Linux, not OSX).

Right now, Rust can represent all of that except for the #defines:

#[repr(C)] struct sigset_t { ... };
#[repr(C)] struct __siginfo { ... };
#[repr(C)]
union __sigaction_u {
    __sa_handler: extern "C" fn(c_int) -> (),
    __sa_sigaction: extern "C" fn(c_int, *mut __siginfo, *mut c_void),
};
#[repr(C)]
struct sigaction {
    __sigaction_u: __sigaction_u,
    sa_mask: sigset_t,
    sa_flags: c_int,
};

My proposal, then, is to add a thing which expresses exactly what the #defines express, only more hygenically. I call them “field aliases”. They look like this:

#[repr(C)]
pub struct sigaction {
    __sigaction_u: __sigaction_u,
    pub sa_mask: sigset_t,
    pub sa_flags: c_int,
    pub let sa_handler = __sigaction_u.__sa_handler,
    pub let sa_sigaction = __sigaction_u.__sa_sigaction,
};

The abstract syntax is

(pub)? let NAME = ( FIELD '.' )* FIELD

as a new alternative production for struct fields. NAME is anything acceptable as a field name, and the chain of FIELDs must refer to a (perhaps nested) existing field within the struct. Accessing NAME is exactly the same as accessing FIELD.FIELD.FIELD, except that the publicity of NAME is independent of the publicity of what it’s sugar for. (As shown above, the normal usage would be that the implementation-detail union and its fields are private, but the aliases are public.)

Why is this better than anonymous unions?

Because it is syntactic sugar, we don’t have to make any decisions about what anonymous unions mean, when you’re allowed to access which fields, etc.

Also, I suspect it will be easier to machine-generate from C headers.

Couldn’t we do this with macros?

Probably, but then you would have to write sa.sa_handler! = handler_fn which is weird considering you don’t put an exclamation point on sa_mask or sa_flags. Also I don’t know if you can get the independent publicity effect with macros.

Can this be used in any kind of struct?

I don’t see why not, but I don’t know of any use for it other than the FFI scenario.


#2

Interesting alternative! While I’d still prefer to have actual anonymous unions and structs, in the absence of that, this does provide some of the same benefits. (Though doing it in the general case really wants a binding generator.)


#3

FWIW, it should be possible to achieve a similar effect with sufficiently horrid Deref abuse.


#4

wouldn’t it be better to extend existing aliasing syntax? See https://doc.rust-lang.org/reference.html#use-declarations:

A use declaration creates one or more local name bindings synonymous with some other path. Usually a use declaration is used to shorten the path required to refer to a module item. These declarations may appear in modules and blocks, usually at the top.

We just need to lift the restriction above. No need to invent new syntax forms such as let inside structs.

BTW, modern C++ does the same by using using (instead of the legacy C typedef).

Edit: I forgot to put in the actual example

#[repr(C)]
pub struct sigaction {
    __sigaction_u: __sigaction_u,
    pub sa_mask: sigset_t,
    pub sa_flags: c_int,
    pub use __sigaction_u.__sa_handler as sa_handler,
    pub use __sigaction_u.__sa_sigaction as sa_sigaction,
};

the “as identifier” part is of course only needed if we want to use a different name.


#5

@josh

While I’d still prefer to have actual anonymous unions and structs, in the absence of that, this does provide some of the same benefits.

Yeah. This is not meant to supersede anonymous unions and structs, this is just the Easy Thing We Can Do Quickly that will allow me to add complete definitions of siginfo_t to the libc crate so I can get back to the thing I was originally trying to do. (It seems to be my fate to discover yaks that cannot be shaved without first inventing the razor blade.)

@yigal100

pub use __sigaction_u.__sa_sigaction as sa_sigaction

I find this less legible - the most important thing, the name you can actually use, is on the far right, instead of on the far left with all the other names. (Understand that to first order this will always involve assigning a new name.)

However, I hear the argument from “this feature already exists in this other context” and will go along with this if the consensus likes it better.


#6

Unless the struct has two anonymous structs/unions inside it, which is unfortunately quite common.