Pre-RFC: Struct constructor name inference

SOF3 · September 18, 2019, 4:10am

By struct constructor I'm referring to the let a = A { b: c }; syntax.

Rust already supports inference of types based on subsequent usages, so it might be nice to infer the struct name as well:

struct OneLongStructName { a: i32, b: i32, c: i32 }
fn bar(foo: OneLongStructName) {}

fn main() {
    bar(_ {a: 1, b: 2, c: 3});
}

This would be useful to achieve something like JavaScript's (TypeScript in particular) named-parameters-by-object pattern, like this:

function ajax(options: jQuery.AjaxOptions) {}
ajax({ url: "https://example.com" })

This is especially useful when the parameter type has a long signature (including generics), or requires imports (which reduces coding fluency).

Underscore

I suggested _() or _{} instead of () or {} because of the potential ambiguity with tuple expressions or block expressions.

Underscore is the idiomatic "infer type" identifier in type context, so it is apparent that _ is there to replace an omitted type name.

Functions rarely directly require passing in a unit struct (usually it's passed to specify a type parameter, which cannot be inferred anyway), so this RFC should not cover unit struct construction.

Enums

It might also be useful to have _::EnumVariant syntax when it's clear that the context requires an enum type, but this is the scope for another RFC. I remember reading a similar RFC before. This thread appears to be related.

Patterns

It might be useful to be able to destructure a known type using a similar syntax e.g. let _ { a, b, c} = foo;, but this is also out of scope.

mcy · September 18, 2019, 4:13am

This is needlessly sigiltastic in the case of named-function-arguments (which, frankly, is the only reason anyone would use this) and a total readability footgun in all other contexts (the pattern version is especially questionable).

I think you should check out the discussion on anonymous aggregate types for use in named function arguments... I believe there's an RFC floating around for that.

SOF3 · September 18, 2019, 4:14am

sorry, can't find that word on dictionary. definition please?

scottmcm · September 18, 2019, 4:30am

Another syntax possibility would be struct {, and similarly enum::A.

I could see omitting the type name as being nice in lets and matches too:

let struct { x, y, z } = get_point();

match foo {
    struct { x: 10, .. } => ...

since the type is already known from the value; it doesn't need to be repeated in the pattern -- especially if there are multiple arms over the same type in a match.

Ryan1729 · September 18, 2019, 4:36am

"sigiltastic" is a neologism/portmanteau coming from a combination of "sigil" and fantastic, which is being used here to mean "too many sigils". You probably would find the word sigil in a dictionary, but the meaning in a programming context may not be clear. In this case a reasonable synonym would be "punctuation". So symbols like _ and {} as apposed to letters.

spunit262 · September 18, 2019, 5:13am

Constant declaration are often quite repetitive.

const FOO: Bar = Bar { .. };

and a total readability footgun in all other contexts

I can't think of any time this would affect readability that wouldn't error on inference.

(the pattern version is especially questionable).

How so? It can't be used unless you already have a value, and most uses of a value don't require you to name its type.

josh · September 18, 2019, 5:15am

I personally wouldn't have any objection to the elimination of repetition in a constant, but I'd rather do so by eliminating the type:

const FOO = Bar { a: 1, b: 2 };

SOF3 · September 18, 2019, 8:36am

Looks good, but it might be a bit confusing when compared to other languages though.

In JavaScript, function(){} (which is basically an item definition omitting the identifier) defines an anonymous function, which is equivalent to a reference to the function. But here we aren't referencing the struct type.

In PHP, new class {} instantiates a new object of a new anonymous class, which is close enough here, but still doesn't give the idea of "inferred type".

bill_myers · September 18, 2019, 9:09am

Can we do it without the _ if we give up using : as type ascription? (or at least parse {a: 1} as an anonymous struct instead)

SOF3 · September 18, 2019, 9:11am

Right now {a: 1} does not really parse into anything. But this is for future-proof reasons.

In addition, we still have the same problem with tuples.

And single-valued unnamed structs? let y = (x) is totally ambiguous (and does not resolve into a tuple anyway; we need (x, ) to make it a tuple), while let y = _(x) is not so ambiguous.

Not sure if _(x) needs to be reserved for anything else related to functions though.

Nemo157 · September 18, 2019, 9:18am

There is an existing RFC to make { a: 1, } a structural record, a.k.a. anonymous struct which could fit the mentioned usecase better. Personally I would prefer that an _ placeholder is used for inference as it is in every other location that has explicit inference (but I also haven't encountered this situation enough that I would really want it implemented).

(with RFC 2584:

type OneLongStructName = { a: i32, b: i32, c: i32 };

fn bar(foo: OneLongStructName) {}

fn main() {
    bar({ a: 1, b: 2, c: 3 });
}

idanarye · September 18, 2019, 10:29am

Another useful usage of this is with enums. It's a common pattern, instead of using a named fields variant:

enum Foo {
    Bar {
        a: u32,
        b: u64,
        // ...
    },
    // ...
}

To have a named fields struct and have the enum use it:

struct Bar {
    a: u32,
    b: u64,
    // ...
}

enum Foo {
    Bar(Bar),
    // ...
}

Constructor inference could help in removing redundant repetition here, where Foo::Bar(Bar { a: 1, b: 2 }) becomes Foo::Bar(_ { a: 1, b: 2 }). These repetitions can be quite annoying when construction or matching against deepley nested enum/struct combinations (like when you work with syn)

Centril · September 18, 2019, 1:24pm

There are two ways this could be done:

Use the expected: Ty<'tcx>. We work with the fact that we know what type the function wants is. This makes the implementation easy; just use the expected: Ty<'tcx>.
- The type checking code that needs to change is here: https://github.com/rust-lang/rust/blob/64c09694a6ecc434cd3a61ade89beb1de17770c5/src/librustc_typeck/check/expr.rs#L1024-L1035 + some minor fallout.
- The parsing code is in the vicinity of https://github.com/rust-lang/rust/blob/64c09694a6ecc434cd3a61ade89beb1de17770c5/src/libsyntax/parse/parser/expr.rs#L907.
- This entails that let x: _ = _ { ... }; will result in a "cannot infer type" error.
From a compiler / language-spec complexity POV, this is easy and contained.
Use 1. but when faced with let _ = _ { ... };, use a search based on the specified fields and look for types in the scope that could fit... this is complicated, probably unncessary, and more of a readability footgun.
I said 2 ways... but if we have structural records then we can use a coercion instead or use 1. in case the expected type is a nominal struct (this is what I'd do probably).

From my perspective, we either have both or none. It's a feature of Rust that bidirectional patterns correspond with their expression syntax. We should not introduce inconsistencies here.

_ { ... } solves the backtracking issue but both it and especially struct { ... } seem like syntactic salt. In the latter case, it negates much of the ergonomic benefit purely due to length of struct (6 letters, I could just write out the type name instead...).

You don't necessarily have the function in your window when reading GitHub diffs so you wouldn't know the expected type. That said, this is not really different in nature from other cases involving type inference. The name of the struct could be considered a boring detail in some cases, and necessary in others. This depends on the application domain and one can use judicious use of annotations/providing the name where it is important to do so. This is, I think, best left up to the author/reviewer.

Both solutions are compatible and neither does fully subsume the other. For constants, I'd be fine with doing this for non-pub things. (See also Sign in to GitHub · GitHub)

We can have both using a bit of backtracking. (See https://github.com/Centril/rfcs/blob/rfc/structural-records/text/0000-structural-records.md#backtracking for more.)

ExpHP · September 18, 2019, 1:25pm

Not just specifically functions, but _(x) would conflict with any form of "_ expressions," whatever we might want those to mean. (The idea at least came up in the context of generalized lvalues, to make stuff like (a, b) = (b, a); work.)

Having a uniform word that can be put there is still an ergonomic improvement in some situations. (for the same reason that many people enjoy using Self { ... })

Centril · September 18, 2019, 1:29pm

This one is easy, I think. It is value inference (for const generic contexts where the value is inferrable, or singleton types, e.g. as in [DRAFT] RFC: Infer singleton values by Centril · Pull Request #14 · Centril/rfcs · GitHub .)

Err... strike that; _(...) has _ as a QPath in this context.

ExpHP · September 18, 2019, 1:34pm

You can see that we already have different ideas for what it should mean (see the sentence you didn't quote), hence the uncertainty.

Centril · September 18, 2019, 1:44pm

The question is whether it is enough of an improvement to justify the addition. If we insist on keeping LL(k) (I think we have something like k = 4 today), I'd just use _ { ... } for structural records as well. I do think that avoiding backtracking for some pathological corner cases is the wrong technical decision as:

grepping will still be easy as the type ascription interpretation is pathological,
simple text editors don't (currently, even with LL(4)) fully faithfully highlight Rust syntax and don't really need to in corner cases, and
compiler perf is unlikely to regress notably as it is still pathological and little time is spent in parsing.

I just read too fast -- _(...) is syntactically different than a bare _ so it can have a different meaning.

mcy · September 18, 2019, 1:44pm

Assuming this inference is backflowing (why the hell not), I think something like

fn foo() -> T {
  let x = _ { .. };
  // A whole lot of garbage.
  x
}

there is now significant work in involved in figuring out what the type of x is to a reader... and this isn't even a contrived example!

Centril · September 18, 2019, 1:53pm

I think the blame lies with "a whole lot of garbage".

Solution: don't write huge functions -- they are no fun to review, read, or maintain, irrespective of _ { ... }. It seems to me that more inference helps with reducing function body lengths so arguably it advances readability (or at least long term maintainability, which is not the same as readability! and more important imo).

Here's a similar example which will work today:

fn foo() -> u8 {
    // A whole lot of garbage.
    let x = <_>::default();
    // Some more garbage.
    x
}

mcy · September 18, 2019, 2:25pm

Saying "don't do that" is great when your project has a strong readability approval ethic. However, I've read and reviewed enough C++ to know that the language should not encourage it... And anyway, a lot of projects don't have world-class readability review.

Examples like the one you list are already pretty bad.

Topic		Replies	Views
Anonymous/inferred field types language design	13	2231	March 25, 2019
Auto infer namespaces on struct and enum instantiations language design	45	4473	November 22, 2021
[Pre-RFC] Inferred Enum Type language design	61	5021	May 16, 2022
Allow ignoring struct name in function calls ideas (deprecated)	4	1236	March 25, 2019
Type Inferrence not Recognized language design	11	496	February 14, 2025

Pre-RFC: Struct constructor name inference

Underscore

Enums

Patterns

Related topics