Stabilizing a machine-readable -Zprint-type-sizes

TL;DR the gecko developers would like a stable version of -Zprint-type-sizes, so we can automatically generate C(++) bindings to rust code that can pass repr(rust) types by value. For example, we would like to be able to store Vecs in our C++ code, to be passed to Rust code on demand.

It would be nice if we could use this system to access fields and enum discriminant/variant values, but this is a “bonus” we’re willing to give up. We are willing to respect Rust’s privacy system (although I expect we’ll cheat for a few special types?). This would let us manipulate the fields of a Rect type, or switch on an Option. I will be proceeding assuming providing this is desirable.

We do not need destructor generation, because there’s no way in C++ to encode Rust’s version of move semantics without drop flags. We can manually create wrapper types for the cases where we want destructors.

We do not need to be able to handle all types natively. For instance we don’t care about being able to handle fully zero sized types. However we can easily handle zero-sized fields such as PhantomData or () in enums (we can just ignore them, alignment and offset implications are visible on the non-zero-sized parts).

We do not need support for calling extern "rust" functions/methods.

This must work with cross-compiling. So we can’t actually execute any Rust code to e.g. print these properties. Using const-fn to create a bin/lib with the desired values stuffed in constants also doesn’t work, because our platform can’t read the target format. Similarly it would be too much work to try to parse the target platform’s debug info.

As such we believe the best path forward is to stabilize a machine-readable version of print-type-sizes, so that the compiler can produce these values for us for a given a target triple. Our build system would take this information and generate headers from it (ideally with static assertions that the produced C(++) structs have the layout we expect).

The Flag

What we desire can reasonably be considered a kind of output, in the same vein as bin or lib. As such, I think it would be reasonable to expose it under:

--emit typeinfo

This would make it easy to integrate into any build system: just add the flag to your existing build.

It’s possible something better could exist for cargo/RLS. Not an expert here.

The Format

JSON seems to be the format of choice, and I have no issue with this.

informal grammar:

// Final output is an array of types
$output: [ $type* ]

// I think this is it...? (primitive distinction I think matters...?)
$type: $struct | $enum | $union | $primitive

$struct: {
    "kind": "struct",
    "name": $string,
    "public": $bool,
    "size": $int,
    "align": $int,
    "fields": [ $field* ]
} // optional `packed` bool? Technically implied by layout details...

$field: {
    "name": $string,
    "type": $type_name,    
    "public": $bool,
    "offset": $int
}

// References to types can have pointer/array modifiers
// (& and &mut are lowered to *const and *mut)
// also I'm being lazy here -- only one set of quotes, 
// despite what this grammar implies
$type_name: 
    "*const $type_name"  
  | "*mut $type_name" 
  | "[$type_name; $int]"
  | $string
  
// types like Option<&T>, we could overlap the discriminant/payload, 
// and the parser needs to handle that (or give up and produce an opaque type).
// For more complex types we might want to support `discriminant: [$field*]`?
$enum: {
    "kind": "enum",
    "name": $string,
    "public": $bool,
    "size": $int,
    "align": $int,
    "discriminant": $field, 
    "cases": [ $case* ]
}

// Same as $field, but with a discriminant_value 
// (some overlap with unions... 🤔)
$case: {
    "discriminant_value": $int,
    "name": $string,
    "type": $type_name,
    "public": $bool,
    "offset": $int
}

$union: {
    "kind": "union",
    "name": $string,
    "public": $bool,
    "size": $int,
    "align": $int,
    "fields": [ $field* ] // offset always 0?
}

$primitive {
    "kind": "primitive",
    "name": $string,
    "size": $int,
    "align": $int
}

// $int, $bool, $string are JSON primitives
// NOTE: tuples get lowered to rust::Tuple<A, B, C>
// NOTE: slices get lowered to rust::Slice<T> and rust::SliceMut::<T>

A consumer of this format would presumably parse the whole file into a HashMap<String, Type>

Example of a program that only uses Vec<u8> and Option<usize> on x64.

[
    {
        "kind": "struct",
        "name": "std::vec::Vec<u8>",
        "public": true,
        "fields": [ 
            {
                "name": "buf",
                "type": "alloc::raw_vec::RawVec<u8>",
                "public": false,
                "offset": 0
            },
            {
                "name": "len",
                "type": "usize",
                "public": false,
                "offset": 8
            }
        ],
        "size": 24,
        "align": 8
        
    },
    {
        "kind": "struct",
        "name": "alloc::raw_vec::RawVec<u8>",
        "public": false,
        "fields": [
            {
                "name": "ptr",
                "type": "*const u8", // ehh fudging
                "public": false,
                "offset": 0
            },
            {
                "name": "align",
                "type": "usize",
                "public": false,
                "offset": 8
            }
        ],
        "size": 16,
        "align": 8
    },
    {
        "kind": "enum",
        "name": "std::option::Option<u32>",
        "public": true,
        "size": 8,
        "align": 4,
        "discriminant": {
            "name": "discriminant",
            "public": true,
            "type": "u32",
            "offset": 4,
        },
        "cases": [
            {
                "discriminant_value": 0,
                "name": "None",
                "type": "()",
                "public": true,
                "offset": 0
            },
            {
                "discriminant_value"
                "name": "Some",
                "type": "u32",
                "public": true,
                "offset": 0
            }
            
        ]
    },
    {
        "kind": "primtive",
        "name": "usize",
        "size": 8,
        "align": 8,
    },
    {
        "kind": "primitive",
        "name": "u8",
        "size": 1,
        "align": 1,
    },
    {
        "kind": "primitive",
        "name": "u32",
        "size": 4,
        "align": 4,
    },
]

Which our tool might produce the following for:

// usize, u32, and u8 have been converted to appropriate C types 
// (likely hardcoded mappings, but we can use the primitive entries to
//  validate ABI)

namespace rust {

namespace alloc {
namespace raw_vec {

// leading _ for "don't use me"?
struct _RawVec_u8 {
private:
    const uint8_t* ptr;
    size_t cap;
};

}; // namespace raw_vec
}; // namespace alloc

namespace std {
namespace vec {

struct Vec_u8 {
private:
    alloc::raw_vec::_RawVec_u8 buf;
    size_t len;
};

}; // namespace vec

namespace option {

enum class _Option_u32_Payload: uint32_t {
    None = 0, 
    Some = 1
};

struct Option_u32 {
    union {
        uint32_t Some;
    } payload;
    _Option_u32_Payload tag;
};

}; // namespace option

}; // namespace std
}; // namespace rust

Notes on Stability/Privacy

This feature forces Rust to “admit” various unspecified details like private types and how repr(rust) is implemented. However this does not mean any of these details can be considered stable or reliable. Users of this API will be expected to regenerate their bindings on every build. If they don’t, we can and will break them.

Responsible users of this API will also be expected to respect “public” markers. In my example output you can see we use _'s and private: wherever possible.

Ideally the output can be easily cached when the types don’t change. Even more ideally the rust compiler (or RLS) can just directly tell you “nothing changed”. Even more ideally rustc could directly generate the headers, but for now we’re happy to do this ourselves (it gives us valuable flexibility).

Open Questions:

  1. What canonical paths should be used for each type? I assume rustc has something already developed here for error messages.

  2. Should we also expose re-exports and/or type aliases as a “kind” (which would be emitted as a typedef)? This would give us an “exact” copy of every crate’s facade, which is nice (since you shouldn’t need to care “where” a type really came from).

  3. Should we try to support generics at all? I don’t think so, but you could imagine:

{ 
  "kind": "generic_struct", 
  "name": "std::collections::HashMap<K, V, H>", 
  "type_vars": ["K", "V", "H"] 
  ...
}

Main problem is different monomorphs can have different layouts. Also associated type projections are a mess.

  1. How do we decide what “root” types are emitted? Types that show up in public APIs?

Isn’t this incompatible with existing and future enum layout optimizations? For example Option<&Foo> doesn’t have an explicit discriminant.

Adding some background/context:

Here’s what -Zprint-type-sizes does now:

print-type-size type: `std::process::Command`: 224 bytes, alignment: 8 bytes
print-type-size     field `.inner`: 224 bytes
print-type-size type: `std::sys::imp::process::process_common::Command`: 224 bytes, alignment: 8 bytes
print-type-size     field `.program`: 16 bytes
print-type-size     field `.args`: 24 bytes
print-type-size     field `.env`: 40 bytes
print-type-size     field `.argv`: 24 bytes
print-type-size     field `.envp`: 24 bytes
print-type-size     field `.cwd`: 16 bytes
print-type-size     field `.closures`: 24 bytes
print-type-size     field `.uid`: 8 bytes
print-type-size     field `.gid`: 8 bytes
print-type-size     field `.stdin`: 12 bytes
print-type-size     field `.stdout`: 12 bytes
print-type-size     field `.stderr`: 12 bytes
print-type-size     field `.saw_nul`: 1 bytes
print-type-size     end padding: 3 bytes
print-type-size type: `std::result::Result<Library, Error>`: 152 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Ok`: 144 bytes
print-type-size         field `.0`: 144 bytes
print-type-size     variant `Err`: 88 bytes
print-type-size         field `.0`: 88 bytes
print-type-size type: `std::result::Result<Library, std::string::String>`: 152 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Ok`: 144 bytes
print-type-size         field `.0`: 144 bytes
print-type-size     variant `Err`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size type: `std::result::Result<std::fs::Metadata, std::io::Error>`: 152 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Ok`: 144 bytes
print-type-size         field `.0`: 144 bytes
print-type-size     variant `Err`: 16 bytes
print-type-size         field `.0`: 16 bytes
print-type-size type: `std::result::Result<std::sys::imp::fs::FileAttr, std::io::Error>`: 152 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Ok`: 144 bytes
print-type-size         field `.0`: 144 bytes
print-type-size     variant `Err`: 16 bytes
print-type-size         field `.0`: 16 bytes
print-type-size type: `Library`: 144 bytes, alignment: 8 bytes
print-type-size     field `.libs`: 24 bytes
print-type-size     field `.link_paths`: 24 bytes
print-type-size     field `.frameworks`: 24 bytes
print-type-size     field `.framework_paths`: 24 bytes
print-type-size     field `.include_paths`: 24 bytes
print-type-size     field `.version`: 24 bytes
print-type-size     field `._priv`: 0 bytes
print-type-size type: `libc::unix::bsd::apple::stat`: 144 bytes, alignment: 8 bytes
print-type-size     field `.st_dev`: 4 bytes
print-type-size     field `.st_mode`: 2 bytes
print-type-size     field `.st_nlink`: 2 bytes
print-type-size     field `.st_ino`: 8 bytes
print-type-size     field `.st_uid`: 4 bytes
print-type-size     field `.st_gid`: 4 bytes
print-type-size     field `.st_rdev`: 4 bytes
print-type-size     padding: 4 bytes
print-type-size     field `.st_atime`: 8 bytes, alignment: 8 bytes
print-type-size     field `.st_atime_nsec`: 8 bytes
print-type-size     field `.st_mtime`: 8 bytes
print-type-size     field `.st_mtime_nsec`: 8 bytes
print-type-size     field `.st_ctime`: 8 bytes
print-type-size     field `.st_ctime_nsec`: 8 bytes
print-type-size     field `.st_birthtime`: 8 bytes
print-type-size     field `.st_birthtime_nsec`: 8 bytes
print-type-size     field `.st_size`: 8 bytes
print-type-size     field `.st_blocks`: 8 bytes
print-type-size     field `.st_blksize`: 4 bytes
print-type-size     field `.st_flags`: 4 bytes
print-type-size     field `.st_gen`: 4 bytes
print-type-size     field `.st_lspare`: 4 bytes
print-type-size     field `.st_qspare`: 16 bytes
print-type-size type: `std::fs::Metadata`: 144 bytes, alignment: 8 bytes
print-type-size     field `.0`: 144 bytes
print-type-size type: `std::sys::imp::fs::FileAttr`: 144 bytes, alignment: 8 bytes
print-type-size     field `.stat`: 144 bytes
print-type-size type: `std::str::pattern::StrSearcher`: 104 bytes, alignment: 8 bytes
print-type-size     field `.haystack`: 16 bytes
print-type-size     field `.needle`: 16 bytes
print-type-size     field `.searcher`: 72 bytes
print-type-size type: `std::fmt::Formatter`: 96 bytes, alignment: 8 bytes
print-type-size     field `.width`: 16 bytes
print-type-size     field `.precision`: 16 bytes
print-type-size     field `.buf`: 16 bytes
print-type-size     field `.curarg`: 16 bytes
print-type-size     field `.args`: 16 bytes
print-type-size     field `.flags`: 4 bytes
print-type-size     field `.fill`: 4 bytes
print-type-size     field `.align`: 1 bytes
print-type-size     end padding: 7 bytes
print-type-size type: `std::result::Result<std::string::String, Error>`: 96 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Ok`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size     variant `Err`: 88 bytes
print-type-size         field `.0`: 88 bytes
print-type-size type: `Error`: 88 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `EnvNoPkgConfig`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size     variant `CrossCompilation`: 0 bytes
print-type-size     variant `MSVC`: 0 bytes
print-type-size     variant `Command`: 40 bytes
print-type-size         field `.command`: 24 bytes
print-type-size         field `.cause`: 16 bytes
print-type-size     variant `Failure`: 80 bytes
print-type-size         field `.command`: 24 bytes
print-type-size         field `.output`: 56 bytes
print-type-size     variant `__Nonexhaustive`: 0 bytes
print-type-size type: `core::str::SplitInternal<char>`: 72 bytes, alignment: 8 bytes
print-type-size     field `.start`: 8 bytes
print-type-size     field `.end`: 8 bytes
print-type-size     field `.matcher`: 48 bytes
print-type-size     field `.allow_trailing_empty`: 1 bytes
print-type-size     field `.finished`: 1 bytes
print-type-size     end padding: 6 bytes
print-type-size type: `std::iter::Filter<std::str::Split<char>, [closure@/Users/ABeingessner/.cargo/registry/src/github.com-1ecc6299db9ec823/pkg-config-0.3.9/src/lib.rs:409:35: 409:50]>`: 72 bytes, alignment: 8 bytes
print-type-size     field `.iter`: 72 bytes
print-type-size     field `.predicate`: 0 bytes
print-type-size type: `std::iter::Map<std::iter::Filter<std::str::Split<char>, [closure@/Users/ABeingessner/.cargo/registry/src/github.com-1ecc6299db9ec823/pkg-config-0.3.9/src/lib.rs:409:35: 409:50]>, [closure@/Users/ABeingessner/.cargo/registry/src/github.com-1ecc6299db9ec823/pkg-config-0.3.9/src/lib.rs:410:32: 410:61]>`: 72 bytes, alignment: 8 bytes
print-type-size     field `.iter`: 72 bytes
print-type-size     field `.f`: 0 bytes
print-type-size type: `std::str::Split<char>`: 72 bytes, alignment: 8 bytes
print-type-size     field `.0`: 72 bytes
print-type-size type: `std::str::pattern::StrSearcherImpl`: 72 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Empty`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size     variant `TwoWay`: 64 bytes
print-type-size         field `.0`: 64 bytes
print-type-size type: `std::fmt::rt::v1::Argument`: 64 bytes, alignment: 8 bytes
print-type-size     field `.position`: 16 bytes
print-type-size     field `.format`: 48 bytes
print-type-size type: `std::result::Result<std::process::Output, std::io::Error>`: 64 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Ok`: 56 bytes
print-type-size         field `.0`: 56 bytes
print-type-size     variant `Err`: 16 bytes
print-type-size         field `.0`: 16 bytes
print-type-size type: `std::str::pattern::TwoWaySearcher`: 64 bytes, alignment: 8 bytes
print-type-size     field `.crit_pos`: 8 bytes
print-type-size     field `.crit_pos_back`: 8 bytes
print-type-size     field `.period`: 8 bytes
print-type-size     field `.byteset`: 8 bytes
print-type-size     field `.position`: 8 bytes
print-type-size     field `.end`: 8 bytes
print-type-size     field `.memory`: 8 bytes
print-type-size     field `.memory_back`: 8 bytes
print-type-size type: `unwind::libunwind::_Unwind_Exception`: 64 bytes, alignment: 8 bytes
print-type-size     field `.exception_class`: 8 bytes
print-type-size     field `.exception_cleanup`: 8 bytes
print-type-size     field `.private`: 48 bytes
print-type-size type: `Config`: 56 bytes, alignment: 8 bytes
print-type-size     field `.atleast_version`: 24 bytes
print-type-size     field `.extra_args`: 24 bytes
print-type-size     field `.statik`: 2 bytes
print-type-size     field `.cargo_metadata`: 1 bytes
print-type-size     field `.print_system_libs`: 1 bytes
print-type-size     end padding: 4 bytes
print-type-size type: `std::process::Output`: 56 bytes, alignment: 8 bytes
print-type-size     field `.stdout`: 24 bytes
print-type-size     field `.stderr`: 24 bytes
print-type-size     field `.status`: 4 bytes
print-type-size     end padding: 4 bytes
print-type-size type: `std::fmt::Arguments`: 48 bytes, alignment: 8 bytes
print-type-size     field `.pieces`: 16 bytes
print-type-size     field `.fmt`: 16 bytes
print-type-size     field `.args`: 16 bytes
print-type-size type: `std::fmt::rt::v1::FormatSpec`: 48 bytes, alignment: 8 bytes
print-type-size     field `.precision`: 16 bytes
print-type-size     field `.width`: 16 bytes
print-type-size     field `.fill`: 4 bytes
print-type-size     field `.flags`: 4 bytes
print-type-size     field `.align`: 1 bytes
print-type-size     end padding: 7 bytes
print-type-size type: `std::result::Result<std::string::String, std::string::FromUtf8Error>`: 48 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Ok`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size     variant `Err`: 40 bytes
print-type-size         field `.0`: 40 bytes
print-type-size type: `std::str::pattern::CharEqSearcher<char>`: 48 bytes, alignment: 8 bytes
print-type-size     field `.haystack`: 16 bytes
print-type-size     field `.char_indices`: 24 bytes
print-type-size     field `.char_eq`: 4 bytes
print-type-size     field `.ascii_only`: 1 bytes
print-type-size     end padding: 3 bytes
print-type-size type: `std::str::pattern::CharSearcher`: 48 bytes, alignment: 8 bytes
print-type-size     field `.0`: 48 bytes
print-type-size type: `std::collections::HashMap<std::ffi::OsString, (usize, std::ffi::CString)>`: 40 bytes, alignment: 8 bytes
print-type-size     field `.hash_builder`: 16 bytes
print-type-size     field `.table`: 24 bytes
print-type-size     field `.resize_policy`: 0 bytes
print-type-size type: `std::option::Option<std::collections::HashMap<std::ffi::OsString, (usize, std::ffi::CString)>>`: 40 bytes, alignment: 8 bytes
print-type-size     variant `Some`: 40 bytes
print-type-size         field `.0`: 40 bytes
print-type-size type: `std::string::FromUtf8Error`: 40 bytes, alignment: 8 bytes
print-type-size     field `.bytes`: 24 bytes
print-type-size     field `.error`: 16 bytes
print-type-size type: `std::option::Option<(&str, &str)>`: 32 bytes, alignment: 8 bytes
print-type-size     variant `Some`: 32 bytes
print-type-size         field `.0`: 32 bytes
print-type-size type: `std::option::Option<(alloc::allocator::Layout, usize)>`: 32 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `None`: 0 bytes
print-type-size     variant `Some`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size type: `std::ptr::swap_nonoverlapping_bytes::Block`: 32 bytes, alignment: 32 bytes
print-type-size     end padding: 32 bytes
print-type-size type: `std::ptr::swap_nonoverlapping_bytes::UnalignedBlock`: 32 bytes, alignment: 8 bytes
print-type-size     field `.0`: 8 bytes
print-type-size     field `.1`: 8 bytes
print-type-size     field `.2`: 8 bytes
print-type-size     field `.3`: 8 bytes
print-type-size type: `std::result::Result<*mut u8, alloc::allocator::AllocErr>`: 32 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Ok`: 8 bytes
print-type-size         field `.0`: 8 bytes
print-type-size     variant `Err`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size type: `std::result::Result<std::ptr::Unique<(&str, &str)>, alloc::allocator::AllocErr>`: 32 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Ok`: 8 bytes
print-type-size         field `.0`: 8 bytes
print-type-size     variant `Err`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size type: `std::result::Result<std::ptr::Unique<std::ffi::OsString>, alloc::allocator::AllocErr>`: 32 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Ok`: 8 bytes
print-type-size         field `.0`: 8 bytes
print-type-size     variant `Err`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size type: `std::result::Result<std::ptr::Unique<std::path::PathBuf>, alloc::allocator::AllocErr>`: 32 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Ok`: 8 bytes
print-type-size         field `.0`: 8 bytes
print-type-size     variant `Err`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size type: `std::result::Result<std::ptr::Unique<std::string::String>, alloc::allocator::AllocErr>`: 32 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Ok`: 8 bytes
print-type-size         field `.0`: 8 bytes
print-type-size     variant `Err`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size type: `std::result::Result<std::ptr::Unique<u8>, alloc::allocator::AllocErr>`: 32 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Ok`: 8 bytes
print-type-size         field `.0`: 8 bytes
print-type-size     variant `Err`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size type: `std::result::Result<std::string::String, std::env::VarError>`: 32 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Ok`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size     variant `Err`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size type: `std::result::Result<std::string::String, std::string::String>`: 32 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Ok`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size     variant `Err`: 24 bytes
print-type-size         field `.0`: 24 bytes
print-type-size type: `[closure@DefId { krate: CrateNum(2), node: DefIndex(2147489028) => core/6597f18::str[0]::traits[0]::{{impl}}[17]::index[0]::{{closure}}[0] } 0:&&str, 1:&usize, 2:&usize]`: 24 bytes, alignment: 8 bytes
print-type-size     end padding: 24 bytes
print-type-size type: `[closure@DefId { krate: CrateNum(2), node: DefIndex(2147489063) => core/6597f18::str[0]::traits[0]::{{impl}}[19]::index[0]::{{closure}}[0] } 0:&&str, 1:&usize, 2:&usize]`: 24 bytes, alignment: 8 bytes
print-type-size     end padding: 24 bytes
print-type-size type: `alloc::allocator::AllocErr`: 24 bytes, alignment: 8 bytes
print-type-size     discriminant: 8 bytes
print-type-size     variant `Exhausted`: 16 bytes
print-type-size         field `.request`: 16 bytes
print-type-size     variant `Unsupported`: 16 bytes
print-type-size         field `.details`: 16 bytes
print-type-size type: `std::collections::hash::table::RawBucket<std::ffi::OsString, (usize, std::ffi::CString)>`: 24 bytes, alignment: 8 bytes
print-type-size     field `.hash_start`: 8 bytes
print-type-size     field `.pair_start`: 8 bytes
print-type-size     field `.idx`: 8 bytes
print-type-size     field `._marker`: 0 byte

And here’s the Rust -> C FFI generator we use now, and that we would rewrite to use this new API: https://github.com/rlhunt/cbindgen (note: cbindgen, not bindgen)

I addressed this a bit in this comment:

// types like Option<&T>, we could overlap the discriminant/payload, 
// and the parser needs to handle that (or give up and produce an opaque type).
// For more complex types we might want to support `discriminant: [$field*]`?

I’m in favor of the general idea. I don’t have a problem with exposing this information. But I think we should probably include some disclaimers, possibly in the form of comments in the output itself, that indicate that the layouts of #[repr(Rust)] types etc may change in future compiler releases:

// This file describes the memory layout of various Rust types.
// Note that these layouts are not stable across compiler releases --
// unless the type is marked with an explicit `#[repr(C)]`, we currently
// reserve the right to change the layout of Rust types, even in a minor
// release.

This would be nice. But we can also get this information easily from parsing the crates.

I think having an entry for each monomorph is fine. In the generated bindings we might be able to use template specialization to hide all of that.

Currently in cbindgen, we create a dependency graph starting at the extern "C" fn's in the crate we're generating bindings for. As this flag won't emit those, we'll still presumably need to parse to find those. I'd be fine with getting type information for everything (unless that is too massive, which now thinking of the stdlib, it may be), and just filtering.

Note that the layout doesn’t just change across compiler releases, but it also changes depending on options passed to the compiler, like optimization fuel: Rolling out (or unrolling) struct field reorderings

Therefore I think your note should also contain that such files need to be regenerated anew on every compile, and shouldn’t be checked into VCS.

3 Likes

Could you give an example for how this would look like for Option<&i32>? In particular, which value would the discriminant_value field have for the Some case?

Btw, in your example, Option<u32> looks wrong. Doesn't that JSON say that the discriminant is at offset 4, but the fields are at offset 0? And for Some, there's no value for discriminant_value, is that valid JSON?


One more high-level comment: Is it conceivable that one day, rustc will have a layout that cannot be expressed by this format? I would find it hard to exclude this option. So, I think there should be a version number somewhere that we can bump up in this case, so that old tools can notice that they just do not understand what is going on here.

I’ve changed the planned details a fair bit in implementation. I have a WIP PR up here with details of what changed: https://github.com/rust-lang/rust/pull/43761

Ah, thanks. Opaque types take care of my concern :slight_smile:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.