[lang-team-minutes] Implied bounds

I find it remarkable that the “no filters at all” rule raises several interesting questions with regard to other corners of your language design. This seems to suggest that, even if you don’t adopt it in the end, making sure that it would work well is a nice way to clean up other aspects of the language.

Interactions with inference

I think that the problem you describe is the sign of an implementation smell: if you know that both &'a T: Hash and for<'x> &'x T: Hash hold, only the latter should be considered; more generally, I would expect inference to be monotonic: the more you learn, the more general types you infer.)

Declaration-site Impliciteness declaration?

In Agda, you write (x : A) in a declaration to quantify over a variable whose value will be explicitly provided by the caller at the use site, and {x : A} to quantify over a variable that will be implicitly provided at the use site. It might make sense to let people declaring a bound decide whether they should be implicitly implied at use-site of their declarations or explicitly provided by the context.

I am not convinced, however, that this is a good idea. Is there an API-design case for having this flexibility, or is there rather a good default for everyone?

Implicitly implied from where?

In Haskell, the use in a function body of a type-class operation suffices to have the type-class constraint implied in the function signature. It works well for the Haskell community, but it is frequently described as “a bit too implicit” – in Scala for example implicit parameters must be explicitly abstracted over in the method declaration.

So far, you are only proposing implicit assumptions at the level of a single type signature: extra bounds would implied by well-formedness constraints of types appearing in the function signature, not in other part of the program (such as the code of the function with this signature). This is a more restricted form of inference and, I believe, significantly safer / more conservative.

Hash<NotHash>: Consistency of abstractions

In his PhD thesis¹, Julien Crétin worked on the question of “when can an assumption safely be made implicit?” and its relation to the question of “when is it sound to reduce/compute an assumption?”. He proposed² the idea of classifying abstractions (function parameters, be it lifetime, type parameters or values/arguments, or even bounds) into the consistent abstractions and the inconsistent abstraction. An abstraction abstracts over a class of objects (or over a logical proposition), and it is consistent when the class of object is known to be non-empty, or when the proposition is true – otherwise it is inconsistent.

For example, abstracting over a lifetime 'a or a type T is always consistent, because we know that some lifetimes and types do exist. Subtyping-bounded abstractions, abstracting on “a type T such as T <= Foo" holds”, are also consistent as we know that Foo is in the class of valid types. Abstracting over a term variable of type Hash<NotHash> is inconsistent as we know that there is no value of this type – the body of such a function would be dead code.

An interesting aspect of this design is that grouping abstractions matters. It is not the same thing to abstract over all T <= Foo and to abstract over all T first, and then later assume that T <= Foo. In the latter case, the second abstraction is inconsistent, in the sense that for an abstract T we cannot prove that T <= Foo holds. (This is the shape of GADT assumptions, they add equalities/constraints over a pre-existing type parameter, and they are inconsistent abstractions that must block reduction.)

I think that a reasonable take-away from Julien’s work is that it makes sense to distinguish consistent and inconsistent abstractions in a programming language, and force users to be explicit about the fact that a given abstraction is inconsistent. This allows to show a warning or error message when an abstraction that they believe is consistent in fact is not, which often corresponds to a programming error. I believe that this approach is a good solution of the usual C+±templates problem of “we cannot see whether we made a mistake without trying to use the new definition in client code”.

In particular, allowing consistent implied bounds to remain implicit while asking for inconsistent implied bounds to be explicit would be an interesting “filter” to consider.

However, I think that the question of consistency of trait bounds should be treated with care: when is the abstraction on T where T: Trait consistent? Formally it should be consistent if there exists at least one T such that T: Trait holds, but what is the scope of declarations that should be searched for? This question interacts directly with the part of your language design on scoping of trait instances (orphan instances, etc.); it may or may not be the case that the type-checker can take a closed-world view of trait declarations and know for sure that a trait has no inhabitants.

¹: See the article-sized version of his work at System F with coercion constraints (PDF), Julien Crétin and Didier Rémy, 2014. I myself did a bit of work on the topic, looking at how to design language features for inconsistent abstractions (those that should block reduction), which is less relevant to the current discussion, but see these slides in case.

²: The question of consistency of contexts was also discussed in other works around the same time, for example System FC with Explicit Kind Equality by Stephanie Weirich, Justin Hsu, and Richard A. Eisenberg, 2013, but there it is more on the side and in particular the question of which abstractions preserve consistency is not studied in depth.

8 Likes

Is this something you experience when reading other people's code, or only when writing your own? I know from my experience, the type of information that seems tedious to write, is actually useful to have completely spelled out, when trying to understand unfamiliar code.

2 Likes

Yes. We’re talking about bounds that are required for the receiver to be well formed, I can’t possibly have a HashMap to call these methods on without the key being Hash + Eq. What I want is to quickly narrow in on the additional bounds, if any, this impl adds.

2 Likes

As far as I’ve understood the discussion, it’s mostly about having redundant and tedious trait bounds vs. eliding them and getting unexpected behavior.

In this case I believe that there is a third option: Explicit Elision. Instead of writing he name of the mandatory trait, you’re allowed to elide the name explicitly with an underscore (_). This syntax is idiomatic for any rust programmer, non-redundant and it still clearly states the presence of a trait bound. Additionally, an IDE would figure out the elided type and show it to the user.

As an example, this is how the function from the first post would look like using this syntax:

fn use_set<K: _>(set: HashSet<K>) { ... }
7 Likes

for might have more uses than simply reducing the bound boilerplate. It could also reduce the amount of effort needed refactor code when a new lifetime / type parameter gets added. If I had some code like this:

pub struct Foo { ... }

impl Foo { ... }
impl Clone for Foo { ... }
impl PartialEq<Foo> for Foo { ... }

and one day I needed to add a new lifetime parameter to Foo, then I could change that to:

for<'a> {
    pub struct FooR<'a> { ... }
    type Foo = FooR<'a>;

    impl Foo { ... }
    impl Clone for Foo { ... }
    impl PartialEq<Foo> for Foo { ... }
}

It doesn’t work for all cases (in particular, impl PartialEq<Foo> for Foo is slightly less desirable than impl<'b> PartialEq<FooR<'b>> for Foo) but should handle most trivial situations.

(Bikeshed: I wonder if impl<'a> { ... } would be a suitable syntax for this? There is already precedent for impl enclosing large blocks of code.)

Yes. We're talking about bounds that are required for the receiver to be well formed, I can't possibly have a HashMap to call these methods on without the key being Hash + Eq. What I want is to quickly narrow in on the additional bounds, if any, this impl adds.

Thinking about this, there's an interesting parallel between this discussion and the discussion about uninhabited types. Note that strictly speaking, adding bounds directly on the HashMap type would be a breaking change. This code is presently legal:

use std::collections::HashMap;

fn provide<K, V>() -> HashMap<K, V> {
    unimplemented!()
}

fn consume<K, V>(_: HashMap<K, V>) {}

struct NotHashEq;

fn main() {
    let m = provide::<NotHashEq, i32>();
    consume(m);
    println!("Done");
}

Adding the constraint to the HashMap type itself would of course make both of these functions ill-formed. Adding implied bounds would make the functions compile, but the calling code would be broken.

You could, recognizing that the basis of adding the implied bounds is the inhabitedness of HashMap<K, V>, follow the suggestion by @Fylwind to put the constraint on the HashMap constructor. This would mean this code would remain legal, you just wouldn't be able to meaningfully implement provide, which is exactly the case today due to the public interface of HashMap. Then, depending on how Rust ends up reasoning about uninhabitedness, the bounds could be deemed to hold for all uses of the result of provide, since either they do hold, or the code is unreachable anyways.

This could of course create surprising results, since someone could design a whole system based around provide, and only later realize that provide can't be implemented meaningfully. This is somewhat worse than the status quo, since right now, you couldn't do anything meaningful with the result of provide, limiting the amount of code that could depend on the (effectively) uninhabited type. The compiler could warn you when the code is invariantly unreachable, e.g. since HashMap is known to be uninhabited for the provided K, rather than simply not being universally inhabited because K lacks the appropriate bounds. Note how this relates to above discussion about how the compiler reasons about uninhabited types/variants.

The fact that presently you can't do much with an unbounded HashMap is also probably sufficient to make the breaking change of adding the bounds to the type acceptable, since Rust has tended to be pragmatic about this sort of thing. I think it's just worth noting the breakage and the implications of trying to avoid it.

1 Like

@nikomatsakis Any progress here? And are there opinions on my explicit elision proposal?

Sorry, nope. =) I haven't had time to dig any deeper here.

I haven't really made up my mind. It doesn't change any of the more fundamental tradeoffs -- in other words, removing bounds from struct definitions and so forth would still be a breaking change -- but I can imagine it might be helpful in terms of preventing surpris.

I think ultimately I'd like to experiment a bit to get a "feel" for how surprising the various options seem in practice. I'm not sure if K: _ seems better than applying a "filter" to type parameters, for examples, but it well might. It does seem like the K: _ proposal cannot cannot handle implicit bounds like Option<K>: Hash, which could arise from e.g. a HashSet<Option<K>>, for better or worse.

@nikomatsakis I’ve started looking into this issue and thinking about possible solutions. Are this thread and your blog post still the best sources of discussion around the issue?

I have a half-formed idea around inferring the necessary bounds on the types and propagating those constraints back to the type parameters as implied bounds, but I’m still working out the details. Will post more (or start a separate thread) if I have something more concrete to propose.

I’m also still trying to get a sense for when it would be okay for the compiler to leave the bounds implicit. My gut says there could be a problem when you expected the types to be unbounded, but the compiler (silently) added bounds for you, and then you only find out when you go to use the function. I need to think through some concrete examples to figure out if that could plausibly happen in real code, but that seems to be the possible drawback of implicitness.

Re: the NotHash example: seems to me like that’s a problem with checking well-formed-ness at the call site, right? If it was checked at the callee, then you’d get an error right away. So I’m not sure that the place where the check is done is a totally orthogonal issue.

Let me know if there’s a better venue for this discussion. Just thought I’d pick up the thread here for now.

This venue seems ok for now. Sorry for being slow to respond. Been traveling and am still digging myself out of the notification hole.

While this is a hack, you can totally do that in the existing type system:

// define marker trait:
pub trait Valid {}

// later in code, when you have some type:
pub struct S<T>(...);

// you can use marker to define constraint alias:
impl<T: Clone + Copy> Valid for S<T> {}

// and whenever you want to use that alias, just do:
impl<T> S<T> where Self: Valid { ... }
impl<T> PartialEq for S<T> where Self: Valid { ... }
1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.