Structural types again -- desugar to named struct in std

copying this here so discussing it is less cluttered:

a quick idea: instead of desugaring to tuples, how about desugaring to an instance of a named struct in std:

// in std::somewhere
pub struct AnonymousStruct<Fields: std::marker::Tuple, const NAMES: &'static [&'static str]>(pub Fields);

impl<Fields: std::marker::Tuple, const NAMES: &'static [&'static str]> AnonymousStruct<Fields, NAMES> {
    /// Self::get_field_index("abc") = Ok(5) means self.abc desugars to self.0.5
    pub const fn get_field_index(name: &str) -> Result<usize, ()> {
        const { assert!(NAMES.is_sorted(), "NAMES must be sorted"); };
        NAMES.binary_search(&name).map_err(|_| ())
    }
}

pseudo-code:

fn desugar_anonymous_struct_type(fields: Vec<(Member, Type)>) -> Result<Type> {
    // sorts by field name
    let mut fields: BTreeMap<Member, Type> = fields.into_iter().collect();
    let mut tuple_fields: Punctuated<Type, _> = (0..).map_while(|i| fields.remove(&Member::Unnamed(i.into()))).collect();
    if fields.is_empty() {
        // actually a tuple, since all fields are named using successive integers
        return Ok(TupleType { paren: Paren::default(), elems: tuple_fields }.into());
    }
    if !tuple_fields.is_empty() {
        bail!("all fields must be named");
    }
    let mut keys = vec![];
    let mut types = vec![];
    for (k, v) in fields {
        let Member::Named(k) = k else {
            bail!("all fields must be named");
        };
        keys.push(LitStr::new(&k.to_string(), k.span()));
        types.push(v);
    }
    Ok(parse_quote!(::std::somewhere::AnonymousStruct<(#(#types,)*), { &[#(#keys),*] }>))
}

I would expect the code for the field access operator . to be structured such that a special case can be easily added for a lhs of AnonymousStruct since it needs to resolve the lhs type anyway in order to resolve existing types' field names.

desugaring to a named struct in std instantiated with a tuple of the sorted field types and a const generic slice of sorted field names for the anonymous struct neatly solves nearly all the implementation and how-to-impl-traits-for-all-anonymous-structs issues afaict.

example desugaring:

{ a: u32, x: f32, blah: T }

desugared -- note how blah is moved before x since "blah" < "x":

::std::somewhere::AnonymousStruct<(u32, T, f32), { &["a", "blah", "x"] }>

example trait impl:

impl<Fields: Tuple + Clone, const NAMES: &'static [&'static str]> Clone for AnonymousStruct<Fields, NAMES> {
    fn clone(&self) -> Self {
        Self(self.0.clone())
    }
}

How would this order be decided? Lexicographically, based on the code provided?

Other than that single question, this should work and be reasonable enough to implement. Having support for tuples > 12 items is a separate issue, but even 12 fields in an anonymous struct would be incredibly useful.

yes, though any consistent ordering based only on field names will work. i picked for that ordering to be sorting-by-field-name so {a: u8, b: i8} is the same type as {b: i8, a: u8} since that seems useful to me. in order for generics to work properly the ordering shouldn't depend on field types, since we want A and B to always be the same type:

type S<T> = {a: T, b: u8}; // substituting types must not change tuple-field order
type A = {a: i8, b: u8};
type B = S<i8>;

It definitely seems reasonable. The compiler optimizes the layout regardless, so it's not like the declared order matters for that.

I was thinking of something vaguely along the same lines. It's pretty far from being usable, but the general idea is to define the semantics of structural types using existing constructions, then possibly add a layer of (mostly) syntactic sugar to make it a bit nicer.

// First, define a field, along with a macro for a little syntactic sugar.
// Ideally, we'd like something like OCaml's polymorphic variants,
// but well, let's not make things even more complicated.

pub trait Field {
    type TYPE;
    fn take(self) -> Self::TYPE;
}
macro_rules! field {
    ($name:ident, $ty:ty) => {
        struct $name($ty);
        impl Field for $name {
            type TYPE = $ty;
            fn take(self) -> Self::TYPE {
                self.0
            }
        }
    };
}

// Examples of fields. If this were to become a language feature,
// we'd need to find a way to lift these fields from the record..
field!(Red, u8);
field!(Green, u8);
field!(Blue, u8);
// Now that we have fields, we can define a record
struct Record<F> where F: Fields {
    pub fields: F,
}

trait Fields {

}
impl Fields for () {}
impl<T, U> Fields for (T, U)
    where U: Fields, T: Field
{

}

Now, let's use this

fn do_stuff_with(color: Record<(Red, (Green, (Blue, ())))>) {
    let (red, (green, (blue, ()))) = color.take(); // Destructure the record into fields.
    let (red, green, blue) = (red.take(), green.take(), blue.take()); // Extract the data.
}

fn main() {
    do_stuff_with(Record { fields: (Red(255), (Green(255), (Blue(255), ()))) })
}

As I mentioned, very early stage. But maybe something useful?

two issues:

  • the struct specifically needs to be generic so type S<T> = {a: T}; S<u8> = {a: u8}
  • where are the field structs defined? they need to be all in a common crate/mod so one crate's {a: u8} is always the same type as another possibly-unrelated crate's {a: u8}, but rust currently doesn't let you add things to a crate from outside that crate and I expect it never will...

Good point, yes.

Still working it out. I think that there is a way to do this (basically as OCaml handles polymorphic variants), but at this stage, it's still a hunch.

I understand that this is why you use const str, right? Assuming that they will be resolved at compile-time.

yes, but imho more importantly I'm using tuples, because tuples are not defined in any one crate since they're instead compiler builtins, so we don't run into the issue of defining types in a common place and needing to add new types to the common crate from other crates.

It depends on where on the nominal/structural axis you want these to fall.

With pub struct name<T>(pub T);, the name itself is nominal, and anyone who wants a structural type with a field with this name needs to use the name as defined here.

With pub struct Field<const NAME: &str, T>(pub T);, the name is structural, as anyone can name this type with this name.

JavaScript's type system is highly structural, in that every field on a standard hash/object[1] is publicly accessable by name. But you can get a nominal field that is harder for others to access[2] by utilizing Symbol to get a field key guaranteed not to clash with any other field key; that's essentially the same as using a new struct name instead of the shared Field<"name">.

Allowing "object names" could be desirable, for the same reason Symbol is nice to have in JS. On the other hand, since Rust is much less dynamic than JS, it'd be much less nice to work with an object with hidden fields; you need to know how many there are in order to be generic over them.

So as much as representing anonymous structs as roughly (so much imaginary syntax :smiley:)

pub trait Field {
    type Key;
    type Type;
    fn into_inner(self) -> Self::Type;
    fn as_inner(&self) -> &Self::Type;
}

pub struct Nominal<const NAME: &str, T>(pub T);
impl<T> Field for Nominal<_, T> {
    type Key = &str in NAME;
    type Type = T;
    fn into_inner(self) -> T { self.0 }
    fn as_inner(&self) -> &T { &self.0 }
}

pub struct Struct<Fields...: Field>(Fields...,);

impl<Fields...: Field>
type for (i: usize, F: type) in type(Fields...).iter().enumerate() {
    impl Index<F::Key> for Struct<Fields...> {
        type Output = F::Output;
        fn index(&self, _: F::Key) -> &F::Output {
            &self.#i
        }
    }
}

may be — you could model normal struct as this same shape, just with the nominal field names instead of structural field names — it's really not worth the complexity over just giving each field a const &str name.

To be frank, I don't think we're going to get structural records before variadic generics, because manipulating structural records has much the same problems to overcome as variadics. And if we have variadics, I'd personally prefer Struct<((const NAME: &str, T)...,)>(T...,) (if that kind of adhoc association is possible) to Struct<(T...,), const NAMES: &[&str]>(T...,)>. If we don't have variadics, I don't see the relevant impl detail types being any more stable than the Fn traits currently are (i.e. you can refer to a specific applied instance, but not fully generically).


  1. Classes actually support private features with #name, nowadays. ↩ī¸Ž

  2. IIUC you can still list all symbol field keys of an object and get any used Symbol keys, so this can't completely hide fields, but it's essentially as good as it's possible to get in a language with access to nearly unbounded runtime reflection. ↩ī¸Ž

1 Like

I would be very surprised if Rust gained the ability to put both const generics and type generics directly in a tuple, I'd expect to have to use a wrapper type to convert from const generics to types:

#[repr(transparent)]
pub struct Field<const NAME: &'static str, T>(pub T);
// inspired by C++ syntax
pub struct AnonymousStruct<const NAMES..: &'static str, Fields..>(pub (Field<NAMES, Fields>, ..));
1 Like

Just for the sake of scope-limiting (a bit) this conversation, do we agree that for a proof of concept, any implementation that lets us implement the following would be satisfying?

setup!(fn do_something(temporary_name = {red: u8, green: u8, blue: u8}) {
    destruct!{temporary_name, red: u8, green: u8, blue: u8}
    // We now have defined `red`, `green` and `blue`.
});

pub fn test() {
    call!(do_something, {red: 1, green: 2, blue: 3});
}

Rationale: at this stage, I believe that we're more interested in defining semantics than syntax or performance. Once we do have reasonable semantics, a RFC could suggest syntactic sugar and/or implementation optimizations.

1 Like

Here's another variant using &'static str and tuples, as suggested by @programmerjake. It does not implement field reordering or deduplication, but I suspect that both would be pretty easy to add with a procedural macro.

// Associated const equality lets us compare by `const NAME` with static type-checking.
#![feature(associated_const_equality)]

// Truly, a field is just a name.
pub trait Field<T> {
    const NAME: &'static str;
    fn take(self) -> T;
}

// And an auxiliary macro to make implementing `Field` transparent.
macro_rules! call {
    ($callee:ident, $($field:ident = $value:expr),*) => {
        $callee(($(
            {
              // Compile-time cost: probably negligible.
              // Runtime cost: zero.
                #[allow(non_camel_case_types)]
                struct $field<T>(T);
                impl<T> Field<T> for $field<T> {
                    const NAME: &'static str = stringify!($field);
                    fn take(self) -> T {
                        self.0
                    }
                }
                $field($value)
            }
        ),*))
    };
}

We desugar our setup!(fn do_something ...) to

#[inline(always)]
fn do_something_outer<A, B, C>(temporary_name: (A, B, C)) 
    where A: Field<u8, NAME="red">, // <- This is how we maintain static type-checks.
          B: Field<u8, NAME="green">,
          C: Field<u8, NAME="blue">,
{
    // Compile-time cost: probably negligible.
    // Runtime cost: zero.
    let (red, green, blue) = temporary_name;
    let red = red.take();
    let green = green.take();
    let blue = blue.take();
    do_something_inner(red, green, blue);
}

fn do_something_inner(_red: u8, _green: u8, _blue: u8) {
    unimplemented!()
}

Let's check that this is type-safe:

pub fn test() {
    // Correct call: should pass
    call!(do_something_outer, red = 0, green = 1, blue = 2);    // It passes!

    // Incorrect call: missing arg.
    call!(do_something_outer, red = 0, green = 1);
       // mismatched types
       //   expected tuple `(test::red<u8>, test::green<u8>, _)`
       //   found tuple `(test::red<u8>, test::green<u8>)`

    // Incorrect call: too many args.
    call!(do_something_outer, red = 0, green = 1, blue = 2, alpha = 3);
       // mismatched types
       //    expected tuple `(test::red<u8>, test::green<u8>, test::blue<u8>)`
       //    found tuple `(test::red<u8>, test::green<u8>, test::blue<u8>, test::alpha<{integer}>)`

    // Incorrect call: invalid arg name.
    call!(do_something_outer, red = 0, green = 1, yellow = 2)
        // type mismatch resolving `<yellow<u8> as Field<u8>>::NAME == "blue"`
        //  expected constant `"blue"`
        //  found constant `"yellow"`

    // Incorrect call: invalid expr type
    call!(do_something_outer, red = 0, green = 1, blue = 0.1);
        // the trait bound `test::blue<{float}>: Field<u8>` is not satisfied
        // the trait `Field<T>` is implemented for `test::blue<T>`
}

Feels... almost usable?

Missing at this stage:

  • the naming of do_something_inner/do_something_outer is pretty fragile, in particular if any kind of recursion is needed in do_something -- doesn't sound too complicated to handle with a procedural macro;
  • we hardcode the order of fields when we desugar setup!(do_something...) -- doesn't sound too complicated to handle with a procedural macro (basically order by field name), could also possibly be managed by some type-level shenanigans, but I'm not sure it's really useful;
  • we don't detect duplicate field names -- again, doesn't sound too complicated to handle with a procedural macro;
  • doesn't allow method calls -- that would require complicating the macros a little, but doesn't feel unreasonable;
  • no clear path towards supporting default values ­-- I have a few ideas but I suspect that error messages would be really bad;
  • doesn't support
if foo {
   struct!({red: 1})
} else {
   struct!({red: 2}) // Different type for `red`!
}

(might be possible to alleviate this by lifting red to the toplevel of a function? module? crate? is it necessary/useful? or by letting users define it manually?)

  • and of course some syntactic sugar would be useful.

Feedback welcome!

One issue with using &'static str for field names is that it doesn't handle hygiene correctly. Identifiers and field names are normally hygienic, such that if you generate several of them in a macro then they are formally distinct to the compiler. The exact scoping rules for these field names is not clear to me, that would presumably need to be hashed out in the RFC, but a forward compatible way to handle it would be to have a Atom type with:

#[derive(PartialOrd, Ord, PartialEq, Eq)]
struct Atom { /* private */ }
impl Atom {
  const fn name(&self) -> &'static str;
  const fn unhygienic(&'static str) -> Atom;
}

as a minimal API to allow types with hygienic fields to exist.


EDIT: Actually it seems that structs can't have hygienic fields:

macro_rules! def_struct {
    ($a:ident) => {
        struct Foo {
            $a: u32,
            a: u32,
        }
    };
}
def_struct!(a); // ERROR: field `a` is already declared

This is by contrast to local variables:

macro_rules! def_let {
    ($a:ident) => {
        let ($a, a) = (1, 2);
    };
}
def_let!(a); // ok
let (a, a) = (1, 2); // identifier `a` is bound more than once in the same pattern

I like the idea, but it feels incompatible with the above trick

   where A: Field<u8, NAME="red">

Unless I'm missing something?

A large part of the point of structural record types is that they aren't hygienic. Everyone writing the type { r: u8, g: u8, b: u8 } is referring to the same type even though they are in unrelated contexts.

2 Likes

You're still thinking in terms of nominal types, not structural types.

1 Like

No, I meant what I said, that's still a structural type since there is no name for the "head" of the structure itself. Now arguably you could use hygienic field names as a poor approximation of nominal typing in a world where this was all you had, but it is primarily an argument from consistency: identifiers in rust have a great deal of information packed in them besides just the sequence of characters (hygiene and edition), so if you have a thing that uses identifiers it should be able to handle identifiers created in any way.

But since the example shows that normal struct field names aren't hygienic, this is mostly moot anyway (assuming that was a deliberate decision).

3 Likes

In the grand scheme of things, macro_rules! hygiene ("mixed site" hygiene) is actually relatively weak; local variables, labels, and $crate are properly hygienic (resolve at "def site") but everything else isn't (resolve at "call site"); notably this means essentially everything defined at item scope isn't hygienic.

Using #![feature(decl_macro)] is required to get experimental "full" def-site hygiene, and with that, you do currently get hygienic field names: [playground]

4 Likes

So, using the ideas discussed above, I have put together a very, very, very early/incomplete prototype, codename "obstruct".

    use obstruct_macros::{instruct, destruct};

    // Create an anonymous `struct`.
    let structured = instruct! { red: 0, green: 1.0, blue: 2 };

    // Get data from that struct. Note that we have altered the order of fields.
    destruct! { let {red, blue, green} = structured };

    // Confirm that altering the order didn't cause any trouble.
    assert_eq!(red, 0);
    assert_eq!(green, 1.0);
    assert_eq!(blue, 2);

This implementation doesn't use unsafe and should have 0 runtime cost.

Note that instruct!{ ... } is a voldemort type, so you cannot write

if flag {
   instruct!{ red: 0 }
} else {
   instruct!{ red: 1 }
}