Pre-RFC: Optional FIelds

lolgeny · February 19, 2021, 8:30pm

Optional Fields

Feature Name: Optional Fields
Start Date: (fill me in with today's date, YYYY-MM-DD)
RFC PR: rust-lang/rfcs#0000
Rust Issue: rust-lang/rust#0000

Summary

Adds an optional ? syntax after a struct's field's type, which is equivalent to wrapping the type in an Option. Additionally, when using struct literals of optional types, these fields may be omitted and automatically filled in as none. If they are specified, they do not need to be wrapped in a Some.

Motivation

A common pattern is having a large struct with many optional fields, which are often used for configuration. This is then passed to a function which matches against them, and acts accordingly. Currently, there are two options:

Using a builder

Builders are a very common way to generate a struct with many optional fields. The builder is created via Default, or another constructor method, then every field has its own associated function which configures the struct. There are, however, a few drawbacks:

For every field, you need to create a corresponding method. This can get unwieldy when using a large number of them. Of course, macros can help here, but they have limits and also can get messy.
Also, each field has to have Option wrapped around the type, which can get annoying and in the way too.
The associated functions can either take a reference (&mut self), or an owned self (and return the same). References are great because they not only allow one-liner configuration, but also allow easy optional configuration (foo.bar(3);). However, if the object needs to be owned to configure upon, one-liner configuration is impossible (you need to store the builder in a variable).

For an owned self, the opposite problem occurs: one-liner configuration is still possible, but at the cost of annoying optional configuration (foo = foo.bar(3);).

Using a struct literal

The other option is to not use a builder. The struct is created in largely the same fashion, but without the hassle of implementing a method for each field. Then, the struct is built using a literal, often using struct update syntax, like:

{
    foo: Some(3),
    bar: Some(5),
    ..Default::default()
}

This:

Means you need to wrap every value in Some, which can get messy when configuring many fields
Means you need to write ..Default::default() at the end of a configuration. For many nested configuration structs, this gets out of hand pretty quickly
The problem of wrapping every type in Option still persists.

Why specifically `Option`? Why does this merit a special case?

Option is used far more than other types when it comes to long structs due to cases like configuration structs as said before. You get massive builders that consist of many many Options. Thus, if they can be specified in a struct literal without having to add builder methods (which have other downsides as explained in this RFC). And if it didn't automatically wrap the value in Some, builder methods would have that rather large advantage; nobody wants to write 20 Somes for configuration.

Guide-level explanation

When defining a struct, you can add an "option modifier", represented by a question mark (?) modifier after a field type. This will convert the type T to be Option<T>. For example, say we had a user struct:

pub struct User<'a> {
    pub user_name: &'a str,
    pub full_name: &'a str?,
    pub description: &'a str?
}

is equivalent to

pub struct User<'a> {
    pub user_name: &'a str,
    pub full_name: Option<&'a str>,
    pub description: Option<&'a str>
}

This is useful in structs with many Options, like configuration structs.

Additionally, when you are using a struct literal with an option modifier field, you may omit it (note this doesn't apply to normal types specified with Option). If you do specify it, then you don't wrap it in Option; you only specify the inner type. For example, you could create the above User with

User {
    user_name: "Ferris",
    full_name: "Ferris The Crab",
    description: "Rust's unofficial mascot"
}

or

User {
    user_name: "scoobydoo",
    full_name: "Scooby Doo"
}

In some cases you might need to use an option here (specifying full_name: Some("Scooby Doo") is invalid). In this case, you can use the special syntax of an option modifier after the field name, i.e

fn get_full_name() -> Option<&str>
User {
    user_name: "abc123",
    full_name?: get_full_name()
}

An example to clarify how this would work, using generics:

struct Foo<T> {
    bar: T?
}
Foo {
    bar: None // implies that `T` must be `Option`. The field `Bar` is Option<Option>
}
Foo {
    bar?: None // does not imply info about `T`, since it's `None`.
}

Note, this syntax isn't supported at all for tuple structs. These aren't complex enough to merit this behaviour (complex tuple structs should be converted to structs with named fields), and can have ambiguity (How do you make a tuple struct (u32, u32?, u32?) with the second left out?)

Reference-level explanation

Option modifier

The new syntax for struct fields would be

StructField:
    OuterAttribute*
    Visibility?
    IDENTIFIER ":" Type
    OptionModifier?

OptionModifier: "?"

Fields with option modifiers would be equivalent to an Option<Type> in all places except struct literals. This means, for the User struct above, to change a field, you would do

let mut user = User {...}
user.full_name = Some("John Smith");

Accessing fields, setting them, referring to them etc. all use the Option type. Struct literals are just a special case.

Rustdoc would also have to change to show these struct definitions correctly.

Struct literal syntax

The new syntax for struct literal fields would be

StructExprField:
    IDENTIFIER OptionModifier?
    | (IDENTIFIER OptionModifier? | TUPLE_INDEX) ":" Expression

As seen in the syntax above, we can do something like

fn get_full_name() -> Option<&str>
let full_name = get_full_name();
User {
    user_name: "abc123",
    full_name?
}

The option modifier after a field's name, for an optional fields, prevents the auto-Some behaviour. The reason it goes here is because if it were to go after the value, it would be ambiguous with the try operator.

If a struct is composed purely out of optional fields, then the unit syntax (i.e the struct name without curly braces) can't be used, despite no fields being set. This is because it may signal intent differently; the unit syntax implies a zero-sized-type. The type would be completely null but would still take up space.

The reason using an option without the option modifier flag (e.g User {full_name: Some(""), ...}) is invalid syntax is because it could be ambiguous with generics (the compiler would struggle to infer full_name, if it were a generic type T).

If an option modifier is supplied after a field name that isn't an optional field, then it should result in an error, since the user probably meant for something else.

Drawbacks

Fields have to be tracked to check if they have an option modifier
The struct syntax is slightly more complex
It could be tougher for beginners to understand, though this is not necessary to explain until later on in the book

Rationale and alternatives

This design is simple, not breaking, and solves the problems well
Not doing this would mean configuration structs, especially nested ones, remain annoying to define and instantiate
The two different positions for this operator show different intents, as they represent different things.

Reasoning footprint

The rust blog has talked about the reasoning footprint, and how implicit features (this being one of them) should act. There are 3 categories:

Applicability: The implicit wrapping of Option is explicit, since the option modifier is used. In a struct literal, this is less explicit, as it looks like a regular field declaration, but there is some heads-up since the field is marked explicitly in the definition. (And the prevention of this auto conversion, with the option modifier after the field name in a struct literal, is explicit)
Power: This is quite powerful, changing a field's type and a value's type. This is probably the largest category, limiting the other two.
Context-dependence: In the struct declration, no context is required at all. In the struct literal, some context is required to see if the field is optional, though this context is already available and used anyway to determine the available fields and types.

Why specifically structs?

This idea could definitely be applied to other contexts, but it would have to be done carefully. Structs are something easy to update now, and would benefit most from this.

Function parameters having this syntax would allow optional arguments of sorts, and this would be great, though not everyone likes this
Function return types could also benefit from this, though it'd probably need some adaptation for Results too.

Alternatives

A small note: you can use .into() instead of wrapping a value in Some. Not really an alternative but it looks slightly nicer, but is still verbose and doesn't save characters.
This RFC pre-draft for default values would help with the option modifier in struct definitions: essentially you could write = None instead of ? as proposed. However, in a large struct with many Options, this is quite long and annoying, taking up a lot more space than ? (though I support that RFC, other default values would be nice).

Another problem with only setting defaults is the need to still wrap the values in Some, which doesn't solve the problem on the user's side. This RFC also proposes a .. syntax which is equivalent to ..Default::default(), which would help partly but still be annoying in deeply nested structs.
This RFC pre-draft suggests using ..* as syntatic sugar for ..Default::default(), similar to above. Again, it's annoying within nested structs and while it shows intent a bit clearer for more default values, for optional fields it doesn't really help.
Add more support for Options with syntax in general. Combined with a shortened ..default() this could work well

Prior art

This sort of syntax is common in many languages, such as typescript, or kotlin. Typescript has the same syntax for declaring them, and their omission/inclusion in object literals represents their presence or null. Of course, rust's Option needs the optional option modifier in the struct literal syntax, whereas in typescript, a value is effectively Some already and undefined is None. undefined represents something that hasn't been initialised, which corresponds well to rust's Option.

It's so integrated into typescript that there are even standard library types (that you can implement yourself, using type mapping) that, say, convert all properties in a class to be optional. This is not a good idea for rust since it can greatly complexify types, and type mapping would be quite hard to implement, but that's not for this RFC to discuss.

Languages like typescript and kotlin also have syntax very similar to rust's for handling null objects: they have the null coalescing operator ?. which is equivalent to Option::map. In rust, using the same syntax invokes the try operator, which has similar intentions, but returns from the whole function if the value is null.

Unresolved questions

Is the option modifier on struct literals indeed the best design?
How exactly should rustdoc show this?
Should the error for specifying an option modifier in a struct literal on a non-optional field be hard or toggleable? Macros may wish to, say, automatically add it.

Future possibilities

This is quite a simple syntax change, so not much. Maybe some way to map attributes to multiple fields in a struct (e.g when serializing a struct composed with optional fields, add a #[serde(skip_serializing_if = "Option::is_none")]). Alternatively, expanding on the auto-wrap of Some, should there be more auto conversions in struct literals?

mbrubeck · February 19, 2021, 8:52pm

While it takes a different approach, it's probably worth comparing this with Default Field Values and similar proposals, because it seems to have some overlapping use cases and effects.

That proposal was most recently discussed in this very long thread: Pre-pre-RFC: syntactic sugar for `Default::default()`

lolgeny · February 19, 2021, 9:03pm

Thanks, that's definitely something to mention. Default field values would get long and unwieldy for structs with many Options, since specifying = None is a lot longer than ?. And for the second, while ..* improves it, with many nested structs (I came up with this RFC after implementing minecraft's predicates) it's still quite verbose. I'll add that now

scottmcm · February 19, 2021, 10:33pm

I think my first instinct here is that I'd like to see ekuber's RFC for = None in struct definitions before considering this.

They're touching enough of the same areas that I'm not sure it'd be worth doing the specific sugar here until we have experience with how that RFC changes the way people write code.

That said, I'll propose this meta-question for you: What is it about Option and Some that make you think it worth doing specifically this conversion in the struct literal?

(And a preemptive plea to those who reply to this: please be more specific than just "explicit is better". Remember the reasoning footprint.)

And some ideas for follow-ups: Are there other cases where it would also be appropriate? Is it equally useful for literals as for variables? Are there other types where it would make sense? ekuber's RFC has an example of .to_string()ing some things in the initializer. Is that a good conversion? Would it be reasonable to say that everything gets .to_owned()ed in a struct literal? Or .into()ed (which would cover Some-wrapping)? There are often parallels between Some and Ok; would it make sense to auto-Ok-wrap values put into a Result field?

lolgeny · February 19, 2021, 10:58pm

Thanks for reading over it! As discussed on discord I made some changes but I'll answer the question here:

What is it about Option and Some that make you think it worth doing specifically this conversion in the struct literal?

Option is used far more than other types when it comes to long structs due to cases like said, configuration structs. You get massive builders that consist of many many Options. Thus, if they can be specified in a struct literal without having to add builder methods (which have other downsides as explained in the pre-RFC). And if it didn't automatically wrap the value in Some, builder methods would have that rather large advantage; nobody wants to write 20 Somes for configuration.

I'll probably add that, and some about your footnote too in the future section.

lolgeny · February 19, 2021, 11:11pm

Ok, added

That paragraph into motivation
Info on errors
Info on the reasoning footprint and how it's balanced (rationale)
Future possibilities, based off scottmcm's footnote

felix.s · February 19, 2021, 11:48pm

struct WhereIsYourGodNow<T>(T?);

What is WhereIsYourGodNow::<Option<u32>>(None).0? Is it None or Some(None)?

(Confusion over this is my primary objection to most implicit wrapping proposals – not to say there aren’t any ways to avoid or mitigate this problem.)

Consider also this:

struct WhereIsYourGodNow2(u32?, u32, u32?);

What is the syntax to construct WhereIsYourGodNow2 with the first field omitted, but the last specified?

lolgeny · February 20, 2021, 12:04am

I should probably explicitly state that it's not supported for tuple structs. I don't think these get complex enough to merit these and have the problems you said.

As in your first question, which could be changed slightly to a supported struct:

struct Foo<T> {
    bar: T?
}
Foo {
    bar: None // implies that `T` must be `Option`. The field `Bar` is Option<Option>
}
Foo {
    bar?: None // does not imply info about `T`, since it's `None`.
}

steffahn · February 20, 2021, 12:49am

hmm.. this seems so inconsitent though (the ? randomly appearing in different places)...

how about

struct Foo<T> {
    bar?: T
}

instead? Although I’d still find this confusing and unnecessarily terse. I think I’d much rather write

struct Foo<T> {
    bar: Option<T> = None
}

and

Foo {
    bar: None // could just as well be left out
}

or

Foo {
    bar: Some(None) // implies that `T` must be `Option`. The field `Bar` is Option<Option<...>>
}

quinedot · February 20, 2021, 12:53am

This proposes both None-ellision and Some-wrapping. Is there any reason they couldn't be separate?

If this was implemented as proposed, you couldn't look at a struct expression containing no FRU and conclude that all the fields were listed. Have you considered an explicit syntax to enable None-ellision? E.g.

User? { username: "abc123", }
// n.b. this could just be an extension to the referenced Default RFCs,
// and not require a new form of struct field declaration

Along those same lines, you could no longer look at field_name or field_name: value and be sure it's not an Option anymore. (E.g. the only part of the OP visible to me right now contains user_name: "abc123", and I cannot recall if this is one of the optional fields or not.) Why not signal the Some-wrapping with the new suffix instead?

User { full_name?: "Beatrix Kiddo", /* gets `Some`-wrapped */  ... }
User { full_name: None, /* works like today */ ... }

Am I missing something when I think the above two changes (and dropping the new form of struct field declaration) would make everything a local syntactical effect? I feel it's better to avoid "action at a distance" where one has to refer back to the declaration to be able to reason about what's going on.

quinedot · February 20, 2021, 1:36am

Apologies for the two consecutive posts, but I recalled something else I meant to mention.

This is inconsistent with how unit-versus-empty structs work today. This may be due to unit structs occupying both the value namespace and type namespace, like tuple struct constructors (though I'm not 100% certain of that). So it's possible your suggestion is a breaking change or otherwise non-trivial to implement.

struct Unit;
struct Empty {}
    
// These work
let unit = Unit;
let unit = Unit {};
let empty = Empty {};

// This does not
// let empty = Empty;

// Nor does this
// #[allow(non_snake_case)]
// let Unit = "foobar";

// But this does
#[allow(non_snake_case)]
let Empty = "foobar";

Nokel81 · February 20, 2021, 2:34am

Given that you have fully defined the type I would have assumed that it is None since the other option isn't a valid Option<u32>.

mjbshaw · February 20, 2021, 2:36am

Do you have concrete/real-world examples for how "messy" and "out of hand" this gets?

H2CO3 · February 20, 2021, 3:07am

That's a pretty weak motivation. While the RFC saves characters, one should be very careful about adding one-off syntax changes like this.

For one, shortness doesn't necessarily help reading. (Ease of reading the code is far more important than writing convenience – code is read a lot more than it is written.) In this case, I find the proposed syntax harder to read than Option. Rust already has a proliferation of symbols, I'm glad at least most types don't hide behind sigils (well, except pointers and references, which are builtins, unlike Option). Adding yet another overload for the meaning of ? would be a mistake.

Second, the proposal doesn't carry its weight when put in the context of other parts of the language, either. Why just struct fields? Option can be used in any place where a type is expected. To me, this just signals that the idea is not really fleshed out, and it's not considerate nearly enough with regards to its impact on and interaction with everything else.

Finally, there's also the question of priority. There far more important things in Rust development to worry about. The compiler wants its soundness bugs fixed, const generics and variadic generics are desperately needed, specialization has a fair number of unresolved correctness questions, the list goes on. It is especially ill-advised to push for superficial changes like this when the design, implementation, and testing of far more substantial features still somewhat lacks sufficient human resources.

If you want a syntax tweak, just use a macro. That's exactly what macros are for. You can write a procedural macro that transforms the proposed code into the currently-accepted style. It's better for you too, because you don't have to wait for it. And it's better for the community, because tools and libraries that operate over Rust syntax (e.g. rustfmt, syn, all (!!!) procedural macros, etc.) don't have to update their syntax and AST support code with yet another case.

There are countless syntax changes that one could reasonably propose. The question is always "why?" and not "why not?" – if all of these changes would be accepted and implemented, the developers would do nothing else other than changing the parser all the time. I'd therefore argue that whatever syntax can reasonably be implemented as a macro should in fact be a macro and not a core language change.

scottmcm · February 20, 2021, 3:27am

How would this argument evolve if default field values existed in general? For example, I could imagine that a configuration struct might no longer have a bunch of Nones, like one could be

struct DeflateOptions {
    level: CompressionLevel = CompressionLevel::Balanced,
    dictionary_size: usize = 1 << 15,
    word_size: usize = 32,
}

lolgeny · February 20, 2021, 8:18am

One of my test cases for a library I'm working on (the structure has to be this way, so it's partly a special case, but there are many structs that could be ported from builders)

Predicate::EntityProperties {
    entity: PlayerContextEntity::This,
    predicate: EntityPredicate {
        distance: Some(DistancePredicate {
            horizontal: Some(Range {
                min: 0.0,
                max: 10.0,
            }),
            ..default()
        }),
        equipment: Some(EquipmentPredicate {
            mainhand: Some(ItemPredicate {
                count: Some(OptionalRange::Exact(32)),
                ..default()
            }),
            ..default()
        }),
        ..default()
    },
}

lolgeny · February 20, 2021, 8:27am

That's a pretty weak motivation. While the RFC saves characters, one should be very careful about adding one-off syntax changes like this.

That quote was from a tiny note about using .into() instead of Some(), so that's taken out of context quite a bit. And while it's true that this RFC does save characters, that's not the only motivation listed.

Second, the proposal doesn't carry its weight when put in the context of other parts of the language, either. Why just struct fields? Option can be used in any place where a type is expected.

Potentially, something that could go into the future section, though I don't think it aligns with this particularly, since it's all a special case for structs.

To me, this just signals that the idea is not really fleshed out, and it's not considerate nearly enough with regards to its impact on and interaction with everything else.

Not necessarily. I think this deserves its own syntax, given how complex some structs can be, especially if it would mean people could port the builder pattern afterwards.

Finally, there's also the question of priority. There far more important things in Rust development to worry about. The compiler wants its soundness bugs fixed, const generics and variadic generics are desperately needed, specialization has a fair number of unresolved correctness questions, the list goes on. It is especially ill-advised to push for superficial changes like this when the design, implementation, and testing of far more substantial features still somewhat lacks sufficient human resources.

I don't think this is a valid argument at all, similar to saying "why explore space when we have problems here on earth"? Yes, there are some problems in the compiler, and const generics are needed, but that shouldn't mean not adding new features, e.g the try block, which is (mostly) equivalent to an immediately invoked closure. But, calling it try is much nicer, signifies intent clearer, and does in fact save characters.

If you want a syntax tweak, just use a macro. That's exactly what macros are for. You can write a procedural macro that transforms the proposed code into the currently-accepted style. It's better for you too, because you don't have to wait for it. And it's better for the community, because tools and libraries that operate over Rust syntax (e.g. rustfmt , syn , all (!!!) procedural macros, etc.) don't have to update their syntax and AST support code with yet another case.

There's no reasonable way a macro could figure out if a field was flagged optional or not, any implementation of such a macro would be rather buggy.

lolgeny · February 20, 2021, 8:32am

Sorry, I don't really understand the question. As in automatically wrapping other types that are defaulted? Like auto-boxing, or auto .intoing? Potentially, though of course we don't want to be too implicit. With Options alone, it's obvious what the conversion will be; with other types, potentially less so.

Unless the question is "if we get default field values, will we need to use options everywhere anymore?". That's a good point to bring up, and certainly one to consider.

lolgeny · February 20, 2021, 9:14am

Hmm. Ideally, the ? would appear after the value, but a try operator could potentially go there too. In a struct declaration, I think it going after the type makes more sense though, since after all it is wrapping the type. Also the different positions signify different things this way.

scottmcm · February 20, 2021, 9:28am

That's the direction I meant, yeah.

(Also part of what I was implying with "how that RFC changes the way people write code" in #4.)

Topic		Replies	Views
Feature request: optional arguments/struct fields etc language design	3	990	November 28, 2021
Default for a subset of fields	22	3350	April 24, 2024
Pre-RFC: syntactic sugar for initializing fields with the same expression language design	2	255	September 28, 2024
Struct field defaults	28	44478	March 25, 2019
Pre-RFC: make function arguments for Option-typed arguments language design	11	2056	March 25, 2019