Seamless type inference for functions (weak generics)?

EricLBuehler · August 24, 2023, 11:59am

Abstract

I propose weak-generic functions for Rust.

Weak generics allows for a reduction and simplification of appropriate code.

This idea of weak-generics is similar to languages such as Python. All functions are implicitly generic, and there is no safety. However, because Rust is strongly typed, per-function-call type checking for weak-generic functions is plausible and sound.

Currently, functions must be fully annotated with types. This can clutter code and could be an additional hurdle to learning Rust. My proposal is to allow function parameters to not have a specified type, but instead a sort of weak-generic. This would allow for the following code to be equivalent:

fn x(a: i32, b: i32) -> i32 {
    a+b
}
fn y(a, b) -> i32 {
    a+b
}

Motivation

Adding weak-generics to Rust would make it easier for new programmers to learn Rust. They would not need to fully grasp the concepts of trait bounds and lifetimes to effectively use "weak-generic" functions.

In short, weak-generics do not add syntax, they actually remove syntax and improve the developed experience.

Teaching

The following code does not compile due to no trait bounds:

fn x<T>(a: T, b: T) -> i32 {
    a+b
}

However, the following code, that uses proposed weak-generics, would:

fn x(a, b) -> i32 {
    a+b
}
fn main() {
    let _ = x(1,2);
}

This could be taught (simply) as:

The Rust compiler takes the types of the input parameters (i32) and fills in the types for all intermediate results. Then, it checks to see if the types implement the Add trait so that the input parameters can be added.

Or, for a more in-depth view:

The Rust compiler builds up a graph of types through the execution flow of the function at compile time based on the input types (implicitly i32 here). It then validates the trait implementations to satisfy method and operator trait bounds.

Implementation overview

To acomplish this, the Rust compiler will need to type-check all individual function calls to weak-generic functions. Because where clauses would not be used here, it is necessary to check all function bodies and execution paths.

Thoughts?

I would appreciate community thoughts on this proposal, as I believe it would really improve the developer experience and learning curve. I have not yet created a sample implementation, and plan on doing so. However, I would also appreciate some pointers to resources that can help me with implementing this type of major syntactical and type-checking change to my fork of Rust.

Thank you!

kornel · August 24, 2023, 12:14pm

Currently something like this is doable with a macro: add!(a, b) generating a closure.

But I agree that some relaxation of generics would be useful. Today if you want to write numeric code working on various integer or float types, it requires a painful amount of std::ops trait bounds, or num_traits, and there's a lot of friction.

You'd probably still want <T> on the functions, since users may need to specify concrete types in cases where type inference can't or picks something unwanted (like i32 default for numbers).

EricLBuehler · August 24, 2023, 12:22pm

Yes, exactly. This proposal would only add an alternative to current generics. Perhaps a special syntax could be added so that, for example, <|T, E, ...|>, could be specified. This would not allow where clauses, as that is what current generics are for.

kornel · August 24, 2023, 12:27pm

A complete separate alternative could add a lot of duplication to the language. I think it'd be more useful if it was an extension to the current generics, so that these functions would work in existing generic contexts, and create minimal amount of new syntax to learn.

It could be something similar to gradual typing in languages like TypeScript or PHP. Or something like C++ templates with concepts.

fn add<T: Debug + ???>(a: T, b: T) -> T {
  dbg!(a);
  a + b
}

EricLBuehler · August 24, 2023, 12:31pm

True, perhaps that is better! Maybe the ? operator can become a 'soft-keyword' for use in generics.

CAD97 · August 24, 2023, 3:35pm

I posited roughly the same functionality previously as macro fn, using T: _ to specify a generic as what is called here "weakly bound." The trick is that you need instantiation-time^[1] resolution/checking for any "weakly bound" generics for them to be of any real use. If you want "inferred bound" behavior instead, the use of a "weak" generic cannot use method syntax (or other auto(de)ref sugar) and can only be a function argument (including to operators which are auto(de)ref-free trait function call sugar).

I still believe that macro fn has a niche, and would be a stronger alternative to expression position macro_rules! more than a weaker alternative to generic fn.

Rust generally would call such a post-monomorphization error, but I find that conflating "during monomorphization" errors (e.g. const evaluation panics) with "after monomorphization" errors (e.g. codegen/linker) errors to be a mistake. (IIUC, const evaluation errors are the only "during mono" error which is currently handled "post-mono".) The important difference is that post-mono errors may be hidden by optimizations which bypass monomorphization (including check only doing pre-mono checks), but I believe instantiation-time errors should always happen. (IIRC we're measuring what the cost would be before stabilizing inline const, which makes const-instantiation errors significantly more accessible.) ↩︎

scottmcm · August 24, 2023, 4:29pm

I think this is the fundamental mental question between dynamic and static. Static believes that the type & lifetime annotations aren't clutter, but provide useful information about what a function does.

(Though yes, you said "can", and it's true Rust probably doesn't want to become, say, Idris.)

Note that those aren't equivalent, because if the arguments are generics there's at least 4 ways you can call y: (i32, i32) as mentioned, but also (&i32, i32), (i32, &i32), and (&i32, &i32)

I think the best response I have here is an article from Steve: Rust's Golden Rule

Rust could have had more global inference like this -- SML, Haskell, and many others allow it -- but today Rust intentionally doesn't.

Any proposal that wants to carve an exception to this rule needs to spend lots of text on exactly what was valuable about that rule, and why whatever loosening the proposal suggests is a place where losing those properties is ok.

Note that we do have let add = |a, b| a + b; today, where inference applies: the "firewall" of the signature is not needed for closures since they're expressions not items.

One concrete thing: One place where having trait bounds is a huge win over C++ -- which does like you describe, and due to it is infamous because of horrible error messages -- is that it means that typos are caught at function definition time, not function use time.

So one thing that will never be accepted is changing it so that a normal fn can call foo.foroble() and the compiler can't know whether that is correct -- or should have been `foo.frobble() -- until someone calls the function with some type.

It's a critical improvement in Rust that name resolution in functions operates when the function is defined, rather than every time the function is used. I'd say that's a fundamental part of a function being a function, and anything that doesn't work that way is more a macro.

Maybe there's space for tweaking when you find out about things not meeting trait bounds -- especially for non-exported things is a common request -- but I think that name resolution not changing here is a critical requirement.

Some context:

The Dreaded Two-Phase Name Lookup - The LLVM Project Blog
Two-phase name lookup support comes to MSVC - C++ Team Blog (note that this blog post mentions "two decades ago" for the time between C++ formalizing the rules for this and MSVC implementing them; I hope Rust never gets anything that takes that long for others to implement)

EricLBuehler · August 24, 2023, 5:02pm

I agree, call-site resolution and checking are the only real way for this to be of use. My proposal is very similar to your macro fn proposal.

PerhapsRust can allow the _ type to be used in function parameter lists instead of an explicit <T, ...> generic declaration. Then, trait bounds can be added on top like _ + Trait. For example, if a function want to add _ to _, then the following code should be written:

fn x(a: _ + Add<Output = _>, b: _ + Add<Output = _>) -> i32 {
    a+b
}

This would accept any generic type for a/b that implements Add with any output _ such that the function returns i32.

EricLBuehler · August 24, 2023, 5:07pm

True, Rust is fundamentaly static and that is one of the reasons why it can encode so much data. I think that the RFC described by @CAD97's macro fn fits Rust better.

toc · August 24, 2023, 6:28pm

EricLBuehler:

PerhapsRust can allow the _ type to be used in function parameter lists instead of an explicit <T, ...> generic declaration. Then, trait bounds can be added on top like _ + Trait. For example, if a function want to add _ to _, then the following code should be written:
fn x(a: _ + Add<Output = _>, b: _ + Add<Output = _>) -> i32 {
    a+b
}

Personally I would prefer such code were "allowed" only in the sense of printing errors which fill in the blanks. Rust analyzer largely fills this gap for me though.

canndrew · August 25, 2023, 6:11am

The amalgamation of rust and idris would be the language to end all languages. I'm always open to DMs if someone wants to work on it.

HjVT · August 25, 2023, 9:03am

Decision to limit inference to function scopes was a deliberate one, i believe.

If you could infer types in function signatures, that would mean that changes in one place in the program could break API in another place.

EricLBuehler · August 25, 2023, 9:29am

Right. This would be a dangerous feature for API stability, however ergonomic it is.

idanarye · August 26, 2023, 1:51am

Isn't this just an impl Trait? The only difference I can think of from impl Trait is that you want the Output of the Adds to get inferred from the function body - which is currently not allowed in Rust. But if we are going to infer the signature from the function body, why not go all the way with OP's suggestion?

gasche · August 31, 2023, 9:41am

Honestly I believe that this is a terrible idea. This basically amounts to giving up on static type checking of generic code and performing C++-style template specialization all the time, with a convenient syntax to encourage people to use it. The ergonomic costs are going to be staggering, and trying to make sense of this as a type-system feature (for example the proposal of gradual typing of type bounds) is a doctoral thesis' worth of work, not something you can reasonably hope to write down in a pre-RFC document (or you end up with a broken design without realizing).

Some languages are betting big on compile-time computations as a substitute for generics, notably Zig. It's great to explore these options in these languages. But adding this feature in addition to the pile of complexity that is already in Rust is a bad idea, you get the worst of both worlds and a ton of confusion and complexity.

This forum has a high volume of discussions and it is difficult to follow it properly. I think that some threads would deserve to be closed early to make it more manageable, and this one is one of them. (Note: this is not at all a criticism of the author or of posting the proposal. It's good that a lot of ideas are considered, but I think that is is helpful to give clear negative feedback when we have some.)

matthieum · September 3, 2023, 10:43am

Syntax is just a mean to an end. (Good) Syntax is only the embodiment, in text, of a Concept of the language.

This proposal does not remove a Concept: it adds a new one. A new Concept necessarily increases the amount of things to learn -- new users will also encounter signatures with fully qualified generics, the std docs are full of them.

Furthermore, removing Tokens does not "remove" Syntax. It just creates an alternative Syntax, which again means more to learn.

This does mean this proposal would not make learning Rust easier. But the above "justification" is null and void, and best omitted.

kornel · September 3, 2023, 12:39pm

The way I see it, there's a trade-off between ease of writing generic functions vs ease of using them.

The current Rust approach of avoiding errors at monomorphisation (use) time goes all the way to favor ease of use over ease of definition. The macro or C++ template approach goes the other way.

I think there are cases where generic Rust code is too difficult and tedious to write — mainly around computation-heavy code that wants to be generic over integers or float sizes. With just std it's almost impossible to add enough std::ops trait bounds to exactly explain what the code needs. It makes use of numeric literals tedious, you can't even simply compare values to 0. And std has no traits for as casts, or most float operations, which makes crates like num_traits necessary in practice. And when a generic function has big complicated trait bounds, errors from it aren't even much clearer than monomorphisation-time type errors would be.

So I think some "gradual typing" for generics would let users find a compromise — use trait bounds for traits that are simple to define, like T: Copy + Debug, but not be forced to Add<Output = Add<Output = Add<Output =…> nested 7 levels deep.

jeanas · September 4, 2023, 9:04am

Doesn't this sound like the solution would be std gaining some num_traits-like functionality?

system · December 3, 2023, 9:04am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dissertation on Algebraic Subtyping language design	2	1194	March 25, 2019
RFC: macro functions language design	58	3753	March 15, 2023
Variadic Generics ideas (deprecated)	7	3304	March 25, 2019
Idea: Would Rust benefit from explicitly polymorphic generics? language design	42	2473	May 23, 2023
Should Fn() -> T implement Into<T>? language design	30	1613	December 30, 2021