Design space: could global inference **but limited internally to the current crate** would be acceptable in Rust?

This is a rather long post, in which I will discuss the design space that local and global inference gives to a programming language. As far as I understand C++ is much more on the side of global inference while Rust prefers local inference. I would like to know if a middle ground could be applied to Rust (global inference to the current crate only, any exported items would still follow the current rules).


Let's consider consider a binary, with a dependency to the library G, itself having a dependency to the library H. They provide those functions:

  • main, f2, f3: from the main binary
  • g1, g2, g3: from a first library G
  • h1, h2: from a second library H

And now, let's consider this call chain:

main -> f2() -> f3() -> g1() -> g2() -> g3() -> h1() -> h2()

  • g1 and h1 and respectively the entry point of G and H (technically I think that main could be considered the entry point of the binary too).
  • f3 and g3 are internal function that call function from external libraries
  • f2, g2 and h2 are purely interacting with the library/binary they are from

Furthermore:

  • all function are generics, and an object is passed as-it from main to h2
  • h2 returns errors that are handled in main

For this example we will consider the implication of those two use-cases:

  • the type of the object created in main is changed, and the internal of h2 must be modified to handle this new type
  • h2 can return a new type of error that must be handled by main.

Global inference (C++)

C++ choose to use global inference for many things like error handling (exception) and generics (template error are generated after mono-morphisation).

int main() {
    try {
        f2()
    } catch (SomeError& e) {
         // do something with e
    }
}
// SomeType could be a template or a C++20 concept, it doesn't really matter
// the input could also be passed by reference to allow dynamic polymorphism
auto h2(SomeType input) -> decltype(auto) {
    if (some_condition) {
         throw SomeError
    }
    // rest of the implementation
}

All the other functions would look like this:

template <class T>
auto foo(T&& input) -> decltype(auto) {
    return bar(std::forward(input))
}

With foo and bar being any pair of function in the call stack.


Since C++ use global inference, it's really easy to apply the two proposed changes. Only main() and h2() needs to be modified. That's a big upside. No code need to be changed for any of the other function, since they forward everything.

The main downside however is that changing the interface (input/output types/concept), as well of the possible exception returned of h2 can break silently semver. It's because the public interface h1 re-export that interface itself. For the same reason, updating H will break silently the semver of G since g1 re-export the interface of g2, and transitively up to f1.

And obviously the error messages are going to be much more complicated to debug, since modifying h2 can break main or vice-versa. What make it really bad is that the error can come from extremely far in the dependency chain.

Local Inference (Rust)

At the opposite of the spectrum, Rust uses local inference.

fn main() {
    let value = 42;  // 42 must implement G::SomeTrait
    match f2(value) {
        Ok(_) => // ...
        Error(G::SomeError) => // ...
    }
}

fn f2<T: G::SomeTrait>(input: T) -> Result<SomeType, G::SomeError> { /* ... */ }
// re-export of H types and traits
pub use H::SomeError;
pub use H::SomeTrait;

pub fn g1<T: SomeTrait>(input: T) -> Result<SomeType, SomeError> { /* ... */ }
fn g2<T: SomeTrait>(input: T) -> Result<SomeType, SomeError> { /* ... */ }
fn g3<T: SomeTrait>(input: T) -> Result<SomeType, SomeError> { /* ... */ }
pub struct SomeError; // implement error and possibly other traits

fn g2<T: SomeTrait>(input: T) -> Result<SomeType, SomeError> { /* ... */ }
fn g3<T: SomeTrait>(input: T) -> Result<SomeType, SomeError> {
    if some_condition {
        return Err(SomeError);
    }
    // ...
}

If we want to change the type of value in main, we can do it, as long as it implements the trait SomeTrait. If it doesn't, we will get a nice error message saying that the type of value should implements it. However, if we want to modify h2 to accept a NewTrait, we will need to modify the whole call stack (no matter if NewTrait is a superset, a subset or unrelated to SomeTrait).

The situation is exactly the same for the error returned by h2. If we want to return other error type, we must propagate this in the whole call stack. This is usually mitigated to use the same enum of error for the whole crate. Using a single enum has other downsides (for example, if foo() returns Result<_, AllErrors>, it's possible to downcast the error to BarError even if foo() will never returns such error).

Local inference has two big advantages: it's really hard to break semver without noticing it (you will modify the interface of the public API of the crate), and the error messages can be as good as possible! The price to pay is a lot of work and boilerplate when one wants to change any interface.

Something in between

I would like to know if a middle ground could be considered for the design of new Rust features:

  • Would it be possible to allow global inference, as long as the inference cannot cross the boundary of what is exported by the crate?

For errors, this would means that f2, g2, g3 and h2 could returns an anonymous enum of unknown types, but that anonymous enum should be converted to a regular enum when exported publicly in g1 and h1. Likewise the constraint of the type accepted by h2 could be any type as long as h1 convert it to T: SomeTrait (and likewise for g3/g1). The same strategy can be applied for ABI, only the public interface of the module must adhere to the ABI of the caller. Internally anything can be used.

In this middle ground, when applying the proposed modification, only main and h2 (obvisouly), as well as g1 and h1 would need to be updated. Neither of f2, f3, g2, g3 would need to be modified.

// all exported trait and types must be public and statically known like today
pub use H::SomeTrait;
pub use H::SomeError;

// public interface, everything is specified like today
pub fn g1<T: SomeTrait>(input: T) -> Result<SomeType, SomeError> {
    g2(input) // error are analyzed post-monomorphisation
}

// private interface, we can forward things like in C++
// I added the following hypothetical syntax:
//     T: ??? means any type
//     enum... is an anonymous enum
fn g2<T: ???>(input: T) -> Result<SomeType, enum...> { g3(input) }
fn g3<T: ???>(input: T) -> Result<SomeType, enum...> { h1(input) }

Both the utilization of T: ??? and enum... would result in post-monomorphisation errors (currently Rust as only pre-monomorphisation errors).

Not allowing global inference for any item publicly exported by the current crate guaranties that any changes to and internal function that would otherwise changes the public API (like we did in C++) would result in a compile time error.

Allowing (even if it's internal to the current crate) global inference has trade-off compared to the current situation.


To sum-up, I'm not asking for any specific feature that would require global inference, but to know if global inference as long as it's internal to the current crate is something that could be used when designing new Rust proposal.

I think there's an important point here that C++ only has deduction, not what I'd call full inference, because information can never flow "backwards" -- annotating the result that came out of foo can never affect the caller.

Rust's current inference is a fuller thing (otherwise let mut v = vec![]; wouldn't ever work) which means the spread of errors from a mistake would be broader in Rust.

It's a bit unclear which parts of the C++ thing you're looking to do in Rust. For example, the "it takes a concept" part could be "it's generic on a trait" without needing crate-wise inference. The "change the type in one place" could be a shared type alias.

The whole abstract type feature is, to some extent, already planned to have some degree of per-crate inference. So I also wonder whether that could do part of what you're looking for.

All that said, sometimes it would be nice to have little helpers that work like C++ templates in that they're duck typed. That can be done with macros today, but I've sometimes pondered a simpler version of that where you want it to work like a function call (no return-from-caller etc) but compile macro-style (so you don't have to bound all the generics or whatever). Strawman: macro fn add3(x, y, z) { x + y + z }.

5 Likes

I do sometimes find myself writing a local closure, not because I need a closure but because closures don't require specifying types.

As an initial step, I wouldn't mind seeing inference allowed for nested functions:

fn top_level(types: Are, required: Here) -> ReturnType {
    ...
    fn helper(inference, allowed, here) -> _ {
    }
}
3 Likes

Is this for ergonomics or expressiveness reasons? For ergonomics, a good IDE should be able to turn -> _ into a type for you.

2 Likes

I'm not Josh, but personally it's a bit of both. Sometime the exact types of the parameters/return type don't really matters and are just adding noice (especially when using generics).

This is also supported in rustc.

8 Likes

Both.

I don't use an IDE. And I care about the code as written. Sometimes I write closures where the types, if written out, would be several times longer than the body.

2 Likes

This would have a significant negative effect to incremental compilation, as you have to typecheck everything again when you change a single function. RPIT (return position impl trait) already has a bit of this problem, but in this case the global inference is much more limited. For example type information can only flow from callee to caller, but not the other way around. In addition the majority of typechecking and all type inference can be performed without knowing the concrete type.

6 Likes

I have a really hard time to understand what exactly is the difference between C++ type deduction and Rust type inference (my brain think in C++, it's really hard to grasp a concept that cannot be expressed in it :wink: ).


Let suppose that we consider that creating anonymous enum is something that we want. An anonymous could be returned from a function, and be forwarded by another function. However, it would only be possible to use the variant inside a match statement that makes the types of the variant explicit.

struct TypeA;
struct TypeB;

// The compiler need to analyses the body of this function to know that
// the return type is enum(TypeA, TypeB)
fn foo() -> enum... {
    if some_conditition {
        TypeA // I assume that implicit conversion is ok
    else {
        TypeB
    }
}

// Likewise, the compiler need to analyses the body of bar to know its
// concrete return type
fn bar() enum... {
    foo() // just forward the anonymous enum created by foo
}

fn baz() {
    let mut v = vec![];
    match bar() { // restore the real type of bar(): enum(TypeA, TypeB)
         // note: the types of all variant must be explicit 
         a @ TypeA => // ..
         b @ TypeB => // ..
    }
}

Such feature only need type deduction, not type inference, right? There is no way for the implementation of foo() to influence the type of baz::v. AFAIK the situation is nearly the same than RPIT with the additional difficulty that changing the implementation of foo() requires to re-typecheck lots of thinks, not only the signature.

Correct me if I'm wrong, but I think this wouldn't be that bad. Since all the types are explicit in the match, a first analysis can assume that this is correct (allowing local reasoning), and only when the whole analysis is finished a new global pass would be done to validate this hypothesis (even if this means that we need to re-compute the return type of bar() and foo()).

  • step 1: Local analysis: typecheck independently foo(), bar() and bar():
    • the return type of foo() is enum(TypeA, TypeB)
    • the return type of bar() is foo::return_type
    • in baz() assume that the types specified in the match are right, and do the analysis based on this assumption
  • step 2: Global analysis: propagate the dependent types computed on step 1 for the return types of the functions:
    • the return type of bar() is enum(TypeA, TypeB)
    • the return type of foo() was already computed at step1 (as well as the one of baz() witch is ())
  • step 3: Local analysis: validate that the assumption made on step 1 holds true.

Would this be true for such feature? As far as I understand, only step 2 and 3 would need to be re-done, and I don't expect them to be long, even if that feature would be used a lot. I want to be sure that I correctly understand the problem space.

Not exactly this, but "global type inference" is not what C++ does. An example of global type inferencing language is Haskell.

Not sure what you mean by exception handling being a kind of "inference", either. C++ exceptions are basically untyped (with the, well, exception of noexcept, sometimes).

As to the proposal: I can assure you Rust's designers know a whole lot about C++ and replacing global features with local ones is exactly one of the ways with which Rust guides programmers towards writing correct code. Let's not give up that.

8 Likes

The difference is that Rust can do:

let vec; // unknown type
vec = Vec::new(); // Vec<unknown type>
vec.push('a'); // now we know it's a Vec<char>

whereas in C++ it stops at:

auto vec; // ERROR: unknown type

In C++ the compiler can only fill in the blanks where type is already immediately known to the compiler (where it can look for an expression in a specific location, check what type it has, and copy that).

Rust can work with truly unknown types and figure them out after the fact based on multiple clues in multiple places (with some limitations).

8 Likes

I can assure you that I 100% know and appreciate it. I already said it, but I definitively loves your intervention against what I said, since I usually learn a lot.

And the reason I asked that question "Can a limited form of global inference be used in Rust?" is because I noticed that most of the ideas I have to improve Rust requires the use of it. If any form of global inference are banned from Rust (I would totally understand it, I appreciate a lot to be able to reason locally about my code), then I would simply stop considers them. And realizing that my ideas requires non-local reasoning was enlightening.

An example of global type inferencing language is Haskell.

I heard a lot about Haskell, but I still didn't took the time to learn it unfortunately.

I've been thinking about how to design a Rust-like language with type inference for years. Rust itself has a number of features that interact poorly with type inference, and it couldn't add global type inference without breaking backwards compatibility in any case, but I think there's a lot of potential to make a Rust-like language with type inference.

That being said, it's not exactly easy to design such a language. At the moment, I want to understand the Rustbelt work and learn about Coq and Iris, since that pretty much feels like a prerequisite to properly designing a new Rust, but that's a pretty daunting course on its own.

1 Like

you mean Idris?

No, I think @Storyyeller did mean Iris.

1 Like

Yes, I meant Iris. It's a big part of the Rustbelt project.