[Idea] `.func(args...)` as a sugar of constructor `InferredType::func(args...)`

A function call which started with dot will sugar to a constructor of inferred type.

/// let t: Vec<i32> = .with_capacity(5);
// πŸ‘† sugar to πŸ‘‡
let t: Vec<i32> = Vec::with_capacity(5); 

more places:

/// let a = A {
///     s: .new(),
///     t: .new(1).unwrap(),
///     s1: .s1(2).unwrap(),
///     s2: .s2(),
///     e2: .e2(.s2()),
///     v1: .with_capacity(3),
///     v2: vec![ .A(4), .A0, .B, .e1(5), .e2(.s1(6).unwrap()).unwrap() ],
/// };
// πŸ‘† sugar to πŸ‘‡
let a = A {
    s: String::new(),
    t: std::num::NonZeroI32::new(1).unwrap(),
    s1: S::s1(2).unwrap(),
    s2: S::s2(),
    e2: E::e2(S::s2()),
    v1: Vec::with_capacity(3),
    v2: vec![
        E::A(4),
        E::A0,
        E::B,
        E::e1(5),
        E::e2(S::s1(6).unwrap()).unwrap(),
    ],
};

struct A {
    s: String,
    t: std::num::NonZeroI32,
    s1: S,
    s2: S,
    e2: E,
    v1: Vec<u8>,
    v2: Vec<E>,
}

struct S {
    f: i32,
}
impl S {
    fn s1(t: i32) -> Result<S, ()> { todo!() }
    fn s2() -> S { todo!() }
}

enum E {
    A(i32),
    B,
}
impl E {
    const A0: E = E::A(0);
    fn e1(t: i32) -> E { todo!() }
    fn e2(s: S) -> Option<E> { todo!() }
}
1 Like

I do like the idea, but this syntax would create an inconsistency: in Rust, . requires an object to its left, for types (or paths) :: is the correct operator, as can also be seen in your desugared examples.

FWIW, there have been multiple calls to allow eliding types with _ where they can be inferred. Your example would then become

let t: Vec<i32> = _::with_capacity(5);
18 Likes

Furthermore, if someone is interested in implementing the bones of the feature, it could be introduced to rustc as a targeted diagnostic with a structured suggestion for the correct code (just like fn foo() -> _ isn't valid code but is properly handled by the typechecker).

3 Likes

I like the general idea of allowing _ in more places, but it's not obvious how to define the inference rules. In this example, the only thing the compiler knows is that a function named with_capacity must return a Vec<i32>, but there's nothing stopping other dependencies from introducing SomeStruct::with_capacity that returns a Vec<i32>, which would make the call ambiguous. This would also mean that any new method on any struct would technically be a breaking change. It seems like inference would have to be restricted to a very narrow subset (perhaps "methods belonging to the exact type that is returned") to avoid creating compatibility hazards.

16 Likes

Note that <_>::method() already infers the type if method is a trait method (not an inherent one).

Apart from this, it's pretty weird that .e2(.s2()) should infer the e2 method on E, even though the expected type should be Option<E>

4 Likes

And, importantly, also requires that the trait has been used, so there's a limited search scope.

If foo(.bar()) worked, what's the scope of things in which the compiler would search for that method? Or even worse, what if it's a fallible constructor so it's foo(.bar().unwrap()) or foo(.bar()?)?

(The oft-discussed idea of allowing foo(.Blah) (or foo(_::Blah) is limited to enum variants, which have the rule that even when they're "functions" they definitely return Self. It's not obvious to me that there's a good way to extend this to any arbitrary inherent method. Traits maybe, though?)

1 Like

While at first glance the idea appeals to me, I see a usability issue with it: If adopted, it destroys code readability for those not using a full-blown IDE.

Let's take the 2nd example:

It's impossible to tell which new method is being called for e.g. a.s without looking at the definition of A or using an IDE that effectively does that for you.

Even if all of us accept that everybody should use an IDE (which definitely isn't the case today nor will it be soon), that still leaves things like GitHub PRs. If you can't read the code in a PR, you can't vet it and thus merging the PR becomes a problem.

The syntax issue has been addressed above, so I won't bother to comment on that.

3 Likes

I'll push back slightly on this one, because today one could do

let a = A {
    s: Default::default(),
    t: TryFrom::try_from(1).unwrap(),
};

which has the same don't-know-what-types problems, while working in stable.

So not being able to see the types involved isn't necessarily a problem.

4 Likes

You're right about that. My answer to it is that code of that form is banned from my repos for the same exact reason as I stated above: it tends to unnecessarily force the reader to perform extra work, just to gain an understanding of what's going on there.

It's not a problem for rustc, no. But can be for human consumption.

1 Like

I never know where to draw the line here.

I used to write C++ and C# with types on every local variable because I thought it was helpful to let the reader know the types without forcing them to perform extra work.

But then I started using auto and var, and found out that, actually, I usually didn't care. Especially in C++, some article pointed out that I could write C++ code in templates that didn't care what the type wasβ€”just that I could push_back it or iterate it or whateverβ€”and that code wasn't causing me problems, so why not do that elsewhere too?

So where's the line between good type inference and bad type inference? I don't know that it's reasonable to draw that line technically. It might be better to allow a bit more than is always the right thing to do, and leave it up to clippy and code reviews to discourage the specifically-unreasonable things. After all, any language powerful enough to be interesting will always make it possible to write bad code. And perhaps the real problem with A { s: Q::l() } is the names, not the type inference.

But of course it still does need to be done in a sturdy way that's resilient to new things in scope, new methods added to types, etc.

5 Likes

Yeah that's a fair point to make. Perhaps it's more of an issue of "it bothers me to have to perform that extra lookup" rather than that that's true for everybody. The end result is the same though, because it kept confusing me. What you say about just needing a piece of functionality provided by a trait is often true, and yet it still messes with my mental model.

I will however argue that using the explicit type when constructing values using struct literals is somewhat more in line with the rustic value of "explicit is better than implicit", in this case its just about the type rather than anything else. But I'll also concede that this can be argued both ways using exactly the argument you laid alout about not always annotating locals with their type.

1 Like

This is a perfect case for the obligatory links to https://blog.rust-lang.org/2017/03/02/lang-ergonomics.html#how-to-analyze-and-manage-the-reasoning-footprint and specifically https://boats.gitlab.io/blog/post/2017-12-27-things-explicit-is-not/#explicit-is-not-local.

That allows a more specific statement of your preference here (assuming I understood correctly): that you'd like it to be local.

Because (assuming this proposed sugar only works for (…) -> Self things on the exact type needed by the type context), this is still specified "explicitly" in the sense (as Boats describes) that you can go look at the definition of the function being called or type being constructed to see what the type is. It's just that the information is remote relative the code in question.

Then Aaron's post gives a good way to look at some of the implications of that non-locality. For example, it mentions "is there always a clear place to look", which is why I bring up the previous parenthetical, as well as the root of the "search scope" I mentioned in a previous post. If it were any static method anywhere, that makes it particularly difficult to understand, since all you'd have to go off is the method name, which isn't used or anything. (Though even that isn't necessarily a blocker, as evidenced by trait calls getting away with it, though the lookup fragility of those are a good indication that anything else like that would need to pass quite a high bar.)

I'm not saying that I think we should have this sugarβ€”in fact I think I'm personally skeptical of itβ€”but I'd be interested in whether there's a generally-applicable rule for why this is too far. It might be an interesting exercise for you to try and see whether you could tease out specifically where your personal line is and whether you can define it precisely. For example, which of these are too specific, which are fine, and which are insufficiently specific? Why is that? Which parts are ok to need to check the definition of A, and which aren't?

A { b: <HashSet<String> as Default>::default() }
A { b: <HashSet<String>>::new() }
A { b: <HashSet<String>>::default() }
A { b: blah::qux() }
A { b: HashSet::default() }
A { b: HashSet::new() }
A { b: Default::default() }
A { b: Foo::bar() }
3 Likes

Unfortunately, $($e:expr)* is allowed in macros, so this new syntax may not work completely in macros.

POC
macro_rules! methods {
    ($($e:expr)*) => {
        $(
            println!("`{}`", stringify!($e));
        )*
    }
}

// Prints:
// `this.that().thus()`
// `another()`
methods!{
    this.that()
    .thus()
    another()
}

(playground)

Re: scottmcm's post just above,

I find all those fine. Where I prefer having the name of the type close at hand is more when I am calling methods on it. For example, I don't really like this:

let mut a = _::new(); // or .new(), etc.
a.insert(2);
a.insert(5);
a.insert(2);
println!("{}", a.len()); // what does this print?
takes_somethingerother(a);

The compiler agrees with you here, and the common version of the proposal which only works for T::fn(...) -> T won't change that. The error is

error[E0282]: type annotations needed
 --> src/main.rs:2:9
  |
2 |     let mut x = Default::default();
  |         ^^^^^
3 |     x.push(5);
  |     - type must be known at this point
  |
help: consider giving `x` an explicit type
  |
2 |     let mut x: _ = Default::default();
  |              +++

and that's not going to change, because at least in your example there's no bound that actually concretizes the type.

It might be the case that this could work in the future:

let x: Vec<u32> = {
    let mut x = Default::default();
    x.push(0);
    x
};

but nobody is pushing to to have that be the case. (Plus, it precludes in-place typestate from being a thing if this were to be allowed.) The type must be concretized by context before any methods are called.

1 Like

I actually started writing a post saying this, but I think this error is somewhat artificial.

Notably, this works on stable:

let x: Vec<u32> = {
    let mut x = Default::default();
    if false {
        x
    } else {
        x.push(0);
        x
    }
};
dbg!(x);

Which says to me that the inherent method restriction might just be an implementation artifact, not something foundational, and thus that a rewrite of inference might start to enable it.

After all, making the code less specific makes it compile:

let x: Vec<u32> = {
    let mut x = Default::default();
    Extend::extend(&mut x, [0]);
    x
};
dbg!(x);

So that also makes it seem non-obvious that inherent methods need to block type inference that way.

1 Like

Indeed, in this case I'd say that's a fair assessment. It's also worth pointing out that I consider that an important property for such code specifically because it's constructing a value. Pretty much anywhere else I consider it perfectly fine to leverage traits to abstract over code, and indeed do so myself.

This might get at the heart of why I ban code like the snippet I mentioned earlier: the lookup fragility, by which I mean that a call to e.g. Default::default() can resolve to just about any type, since Default is pretty common to implement for types. I'm aware that the exact resolution is deterministic, and it's easily enough resolved for Default::default() verbatim because you can look at the type definition. However, resolution can become a bit of a puzzle when the trait in question is more complicated than the aforementioned Default, e.g. when the field is of a generic type. Suddenly that quick lookup you wanted to do to understand some piece of code (probably in service of understanding yet another piece of code) has turned into a proper subquest of its own. The problem is that it isn't just a "mere" context switch; it just kills my flow when that happens.

I'm thinking that that will prove trickier to define than desirable, and would effectively be a heuristic.

Assuming Foo is a type, all of these are fine by me except Default::default(), at least in terms of readability and "would I allow them in my repos". I think it really is the explicit and local naming of the type (or module in the case of 4) when constructing a value using struct literals that anchors such code for me. To be clear, I can still make sense of using traits there e.g. Default::default(), but it takes more effort with the resulting loss of flow.

In certain contexts I've also used

A { b: <HashSet<_>>::default() } 

i.e. number 3 except the type args are left to be inferred, either because at that point the type arg resolution was some hideously long monster that would obfuscate more than it would elucidate, or because it was unnameable.

As an aside: it feels like the fact that rustc needs to handle cases like that is part of why its build times are still pretty long, and why after many years and dedicated engineering efforts we still barely manage to implement intra-unit parallel compilation or non-trivial incremental compilation.

I wonder if there's a version of the language that's a little more harsh on these kinds of cases, but where the resulting compiler architecture is much easier to distribute and memoize.

I don't think this case is relevant at all for that. In some ways, that case is actually far easier than the common let mut v = vec![]; v.push(4);, because it's annotated that x is exactly Vec<u32>, and thus Rust doesn't need to figure that out, just run normal inference and trait logic.

This is why Rust requires full type signatures. Technically Rust could have just run one great big inference over the whole crate β€” after all, SML and friends do that just fine β€” but intentionally chose not to do that, to get better error messages and make things easier to understand on both humans and machines.

Having the signature provides a "firewall" so that this kind of thing is separable†. Once the signatures of all the functions are known, then those can be used to typecheck the bodies, since what happens in the body of one function doesn't impact the body of another function.

There are lots of things that make parallel compilation hard, but I really don't think that inside-a-single-body inference is anywhere close to the top of the list. Type inference and unification is a well-studied problem for which we have good algorithms.

† Well, except for auto-trait leakage through impl Trait.

Related previous posts: [Pre-RFC] Inferred Enum Type