Simplify constructor function

i really want a (optional) simplification of constructor like functions. Example:

cluttered and old:

fn new() -> MyCoolStruct {
    MyCoolStruct {
        member1: 3,
        member2: "Feature Request",
    }
}

concise and new:

fn new() -> MyCoolStruct {
    member1: 3,
    member2: "Feature Request",
};

the advantage is obvious: the code becomes much more readable. i saw a similar topic from 2014. however, that one has just been closed without much discussion. I assume there is a reason why this simple feature has not been implemented, but i cant find out why. also i feel like the semi-colon at the end is imporant.

what if i took matters into my own hand, and created a pull request with that functionality, what are the chances that it will be added?

Extremely low. Syntax additions like this that serve only to make code shorter but don't enable anything new are unlikely to land. Furthermore, things like this generally require an accepted RFC first.

18 Likes

https://lang-team.rust-lang.org/frequently-requested-changes.html#fundamental-changes-to-rust-syntax

8 Likes

A similar syntax may be accepted, though:

fn new() -> MyCoolStruct = MyCoolStruct {
    member1: 3,
    member2: "Feature Request",
}
3 Likes

While I don't think this particular change is a likely one, I also think that FAQ entry doesn't apply here. That entry was meant to be a response to the really common calls we get for things like "allow single-statement if/else to omit braces". I also think it's worth looking past the specific syntax proposal here, and thinking about the general question of "should we make constructors easier to write". The answer may be "no", but the question shouldn't be dismissed out of hand.

7 Likes

Did you know that you can write Self in place of the type name, in any method?

impl MyCoolStruct {
    fn new() -> Self {
        Self {
            member1: 3,
            member2: "Feature Request",
        }
    }
}

That's particularly helpful when you'd otherwise have to repeat a long type name and/or generic parameters.

Another shorthand you might not be aware of: in the common case where your constructor initializes a structure from function arguments, you don't have to write member1: member1:

impl MyCoolStruct {
    fn new(member1: u64, member2: String) -> Self {
        Self { member1, member2 }
    }
}

If you want something more concise than that, there are some crates in the ecosystem with macros that can write a constructor for you, such as derive-ctor. I personally think the more advanced usages of those obfuscate rather than simplifying, but they make the simple cases simpler.

11 Likes

Personally, I'm quite happy with this one. The option of removing just MyCoolStrict { } is not at all worth the debates about which things should be written which ways when. It's just a couple tokens; not worth losing the consistency.

Inferring the typename I'd plausibly be interested in (Swift-style, or whatever) but saving a pair of braces is way too negligible to bother.

19 Likes

I agree. The one case I think might be both worthwhile and easily understood would be if we:

  1. Allow = functions, and
  2. Allow omitting the return type on an = function, if and only if the right-hand side immediately starts with a type name and has that type. This is not type inference; this would only work if the (possibly qualified) type name appears immediately after the =.

That would allow, for instance, fn new(...) = Self { ... }, which seems pretty clear. It would also allow fn new(...) = Self::default();.

1 Like

How would that work with scoped types? How do you know where to take the type from (syntactically)?

fn new() = submod::AnotherMod::weird_type();

where weird_type is a unit-tuple type. Macros tend to want to fully-specify types and not use scoping, so some shorthand that only works in local macros or assumes things about naming conventions seems…unfortunate.

I also question the example's usage of a new() method that just defers to Default::default()…isn't there a Clippy lint saying "just use Default"?

I was expecting that it walked the body expression until it hit the first type name, and permitted omitting the type iff that was the type of the resulting expression. I wasn't expecting it to be purely syntactic.

I'll edit the previous post to spell out that the type name can be qualified.

Clippy has a lint for implementing new() and not implementing Default. I'm not aware of one that complains about having both; that'd be oddly inconsistent.

2 Likes

Rust doesn't have constructors. These functions are actually factory functions, strictly speaking.

While the stuttering, even in Josh's concise example, is a bit jarring I don't think it's worth the complexity cost of adding a special syntax to avoid repeating a couple tokens.

Instead of this syntactic debate I'd reckon it would be more beneficial to discuss adding an actual user defined semantic construction operation to Rust. I'm thinking of the postponed RFCs for an emplacement operator for example, or the moveit crate that emulates this using unsafe code.

3 Likes

Your proposed syntax is ambiguous, at least without type resolution:

fn new() -> Name {
   name
}

Name could be a scalar type, followed by a regular function body with a block of code that returns a variable name. Disambiguating this would require looking at not just basic syntax, but looking up either types or names that are in scope. This is similar to a wart that C syntax has created with typeof, and Rust tries to avoid such issues.

This would be hard for macros that would need to know whether -> Name is a type name or start of a struct literal expression (macros match on :ty and :expr), but macros run before any types are known, so at that point it's not possible to check what is the type of Name (it may not exist yet and struct Name or type Name = () can be emitted by the macro, creating a chicken-egg problem).

There's a high bar to adding syntax sugar to Rust. The language generally prefers to be explicit, and simpler to parse (by humans and tools) than to have clever shortcuts.

7 Likes

The shortest constructor syntax is, of course, not having a constructor.

2 Likes

This reminds me of the conversation we had ages ago about allowing

fn foo(Wrapping(num: i32)) { … }

as a way to have a Wrapping<i32> parameter.

I originally liked that thought, but then the conversation about it convinced me that we shouldn't allow it.

Much better to lean into leaving the signature parts alone, and adding inside-the-body sugar to allow leaving pieces out there instead.

(So things like fn foo(.{ start, end }: Range<i32>) { … }, for example, as the way to avoid repeating Range.)

5 Likes

There's a reason my toy language strongly separates the item signature from the body bindings. To abuse Rust syntax, function items look more like

fn foo(Wrapping<i32>) -> Foo = |arg| { … };

where you assign a closure to the item declaration.

I don't think this is a usable choice for Rust — not only would it a giant syntax shift for little/no semantic benefit, but the argument pattern names serve a documentation purpose, even if they aren't a semantic part of the function signature[1].

Elision of type pattern/literal names when they can be inferred is I think a much better approach for Rust. Using the .{ .. } syntax, while I doubt it'd be the style rustfmt chooses, OP's example could be written as:

fn new() -> MyCoolStruct {.{
    member1: 3,
    member2: "Feature Request",
}}

The rustfmt+clippy style would most likely become

default tall style
impl MyCoolStruct {
    pub fn new() -> Self {
        Self::default()
    }
}

impl Default for MyCoolStruct {
    fn default() -> Self {
        .{
            member1: 3,
            member2: "Feature Request",
        }
    }
}
max wide style
impl MyCoolStruct {
    pub fn new() -> Self { Self::default() }
}

impl Default for MyCoolStruct {
    fn default() -> Self {
        .{ member1: 3, member2: "Feature Request" }
    }
}
with delegation
impl MyCoolStruct {
    pub use Self::default as new;
}

impl Default for MyCoolStruct { /* … */ }

This shouldn't be overlooked. If a type is PODish and has all public fields, the canonical "constructor" is struct literal syntax. You don't need a fn new(). It's a nice convenience to have, especially if it's const, since Default::default can't be (yet), but it's far from a necessity, especially if it doesn't have any special documentation attached to it.

And in fact, even if there are private fields, the canonical "constructor" is still the struct literal syntax, as all other construction ultimately delegates to it (or transmute). Things can get a bit convoluted depending on exactly how strictly you want to apply terms, sure[2], but the important takeaway should be that fn new, while conventional, isn't special in any way and is just like any other function; adding special syntax for "constructor" functions is thus undesirable because it makes them look special when they aren't.

While it is definitely an interesting project to extend Rust in this direction, and guaranteed emplacement would certainly be a powerful tool for performance optimization, Rust's semantics are heavily based on the concept that objects can always be trivially relocated and the address of data isn't meaningful except if it's borrowed at that address.

digression on such

Perhaps unintuitively, a significant portion of times where optimization fails to eliminate "redundant" copies is actually because the address identity properties being preserved are actually too strong! The canonical example:

let mut place = MaybeUninit::uninit();
init(&mut place);
let mut value = place.assume_init();
utilize(&mut value);

place and value are semantically required to have disjoint addresses because both places are live simultaneously. Unless the compiler can prove that the pointer/address provided to either init or utilize are never captured, then it must preserve the two places' disjointness, because the behavior of the code might somehow depend on that property.

And no, lifetime analysis is not sufficient to justify nocaptureLLVM semantics if the lifetime isn't captured. Even if we were to ignore that all currently proposed models give lifetimes no semantic meaning, we want signatures like ptr::from_ref to be validly usable, and that signature doesn't even imply a requirement that '_ only encompasses the call, but instead that it outlives the call. Adding semantic meaning to the exact start/end timing of a lifetime is a terrible idea because the language only permits discussing bounds on any specific lifetime region.

potential solutions

A personal pet concept is to allow writing (move place) as an expression that explicitly moves the value from the place and deallocates the place, even if the type is Copy, thus allowing a new place to potentially reuse the address of the old place. It would need to be more complicated than just that because of partial moves and non-stack places, but that's the general idea — when you aren't doing tricky things with a place and the optimizer fails to deduce that, you can tell it. Similar thoughts apply to explicit become tail calls deallocating stack places earlier.

The other pet concept is to just say that addresses can be logically disjoint but map to the same usize. There are various ways to model this with different tradeoffs (allocation "planes" being likely the most realistic), but they're all either only weakly applicable ("observing" the address in any meaningful way will require the allocation to exist in the single physical allocation plane) or break "obviously true" properties (e.g. cmp(ptr1, ptr2) == cmp(ptr1.addr(), ptr2.addr()); see guaranteed_eq for a real example of this) and in-use techniques that rely on such properties (e.g. futexes).

I don't doubt there're ways Rust could be extended in a backwards-compatible manner to support ?Move and/or ?Drop types. But first class support, even if 100% compatible, is a significant enough semantic shift that it's as close to a "Rust 2" as the Rust project is likely to ever legitimately consider. Even extensions which are theoretically simpler like ?MetaSized or even just using A: Allocator generics for containers are pushing the boundary of what's "spirit of the law" compatible into "letter of the law" territory.

If "exotic" types are confined to only being supported by dedicated "exotic" containers, then first-class support isn't necessary nor particularly useful, and library support for encapsulating and enforcing the additional requirements such as provided by Pin and the moveit crate are plenty sufficient. Of course, other features like projection or &move can hugely improve such "second class" library-level support, but they're useful for more regularly shaped code as well.

(This gets far off the OP topic, so if you want to discuss on this concept further please open a new thread here (or on Discord or Zulip) and ping me.)


  1. It might work for my toy language because it uses argument labels semantically, similarly to how Swift does. But I'm also far from showing that it'll actually work; there are many kinks I'm still working on figuring out. ↩︎

  2. Rust memory isn't typed, so types aren't constructed/destructed in memory like they are by C's Effective Types model or C++'s Object Lifecycle model. Furthermore, Rust always evaluates all field values into temporaries before aggregating them into an aggregate type, unlike effectively every OOP language where the type's memory is created first and each field is assigned to by the constructor. In C++ parlance, Rust only has designated initializers and no constructors. ↩︎

5 Likes

This is a side track, but surely you can never guarantee the unique addresses even today? Memory addresses on the stack will be reused between calls to functions. This may seem irrelevant but inlining (and outlining, which I don't know if LLVM does) can change where function borders are, which means "in same stack frame" isn't even a stable concept.

As such, it seems that the solution of "same usize" is already a foegone conclusion on real hardware as the only logical approach.

I used "unique address" perhaps a bit informally; what's actually meant is that while a place is allocated, that place has a range of addresses that do not alias with any other currently allocated place. It's the "obvious" rule that if object A exists at some location, object B can't also be in the same location, since operations on one don't effect the other. Stability is also relevant — the idea that the address of a place doesn't change while it still exists.

Inlining does change when function prologs/epilogs happen, but it doesn't change the logical allocation or deallocation time of stack places. Stack places already don't necessarily live for an entire function body. A lowering is of course free to extend how long storage is reserved, but if it wants to retain the ability to alias places without rederiving it, it needs to have the capability to encode stack place liveness more granularly than the whole function, e.g. MIR's StorageLive and StorageDead instructions. LLVM has similar lifetime instructions for its allocas.

1 Like

if this is implemented i would also like to see it on constant items, at least those that are not publicly visable

I think the big problem for consts is that we don't have const traits yet, so lots of the normal short-hands don't work.

Once you can .into() and such in a const, then "annotate the item; infer in the body" will work there too.

i'm not really sure what you're talking about here? i'm talking about stuff like
const THING: OnceLock = OnceLock::new(). making Into const doesn't help.