- Yes, this is effectively opt-in reflection that additionally allows specifying/describing types from other languages.
- Yes, this should probably be evaluated in a crate. (I'm still interested in your thoughts on it)
Also see Pre-RFC: Runtime reflection
There are a bunch of things to consider in regards to using reflection within a single process/crate. This is not about that but primarily about encoding, FFI and Sandboxing (e.g. via WASM).
At the moment basically all (unless the other language is C) FFI data exchange goes through a third, less flexible memory layout or data needs to be serialized/deserialized. One (partial) remidiation to that would be a (cross-target) stable memory representation (e.g. #[repr(v1)]
), which works as long as the other side can understand it. Another option is type descriptions, similar to what is needed for reflection and similar to how many encodings work (e.g. json, binary-json, ASN.1/DER, ... which has it right next to the data, protobuf which has a separate human-readable DSL to describe it and others that store it separate from the data but in binary form).
I think it would be amazing for both FFI, serialization (if not everything is known at compile time) and compatibility in general if types (including non-Rust memory layouts) could be described (opt-in generation of static data). Similar (conceptually) to how traits describe functions on types, which are then stored in the vtable. The problem with traits and vtables is that the exact type must be known on both sides at compile time, while a type description can be serialized.
In practice this would likely be a #[derive(TypeDescription)]
, adding a method to get an immutable reference to the static type description. Using this at runtime isn't the only way this could be useful, for example for macros that only need to care about fields/memory layout, see reflect - Rust, which provides its own type description intended for compile-time only.
I think it might make sense to have the type description and derive macro in std, and its usage like deserialization or FFI types in crates.
As an example on how it could be represented:
struct TypeDescription {
// I'm not 100% if this usage is valid or if we need String/Vec.
// List of opcodes manipulate the address to get to the value.
fields: HashMap<&'static str, (TypeID, &'static[Opcode])>
}
#[non_exhaustive]
enum Opcode {
Offset(usize), // addr += arg0
OffsetReadU32(usize) // addr += read_u32_at(addr) * arg0
Pointer, // addr = read_usize_at(addr)
}
#[derive(TypeDescription)]
struct MyType {}
#[derive(TypeDescription)]
enum MyEnum {}
Describing types from other languages (and even most encodings) this way might even be easier than trying to write the corresponding C type on both sides. Similarly, it may provide a more flexible abstraction for crates like serde (which could use this type information at compile time to generate code, instead of having to use and derive its own trait-based view of Rust types). All of these field access are of course only safe if the underlying data actually is of the described type.
As far as I can tell this would be flexible enough for all existing Rust types and most types from other languages. Accessing the fields does have an overhead of course, but when the information is available at compile time this (probably) can generate efficient get/get_mut functions (via macros) for zero-copy access, since it is really close to what needs to be done in assembly anyways.
Why would this need to be in std? Deriving this type description (although possible) is likely more difficult in a crate, which is limited in how much information it can get about the type layout. Additionally some of this information already needs to be present in compiler internals to generate a intermediate representation, so exposing it as a static (on per-type opt-in) might not even cost much in terms of compilation time (still impacts binary size of course). And doing this in a proc-macro (while possible) may break with the addition of new syntax and would have to do work the compiler already needs to do). I see this similar to FormatArgs
, which is a compact representation of what to do (containing the data itself) as a list of opcodes (iirc).
This is (by design) both more limiting and more flexible than having reflection like in other languages, since you don't get the abstract view of what is a struct, what is an enum, but instead only a way to access specific parts. This should also make the TypeDescription data smaller.
The main goal here (especially when combined with memory layout guarantees) is accessing types/data you may not even know at compile time (for example for an Inspector/Editor showing data defined + used primarily by a plugin/extension/dll) without the need of a third data representation (e.g. C), which could result in multiple memcpy to get the data into the required format.
What are your thoughts on this?