[lang-team-minutes] Const generics

On Thursday at the language team meeting we discussed const generics, both in terms of RFC 1931 and the work that eddyb has done and plans to do on the const system.

First, the tl;dr version:

  1. We believe that we can have an RFC accepted and const generics available on nightly by the end of 2017. :tada:
  2. We discussed the major outstanding issues, which have to do with the theory of equality we use in type checking. Iā€™ll summarize the discussio below.
  3. One thing we want to throw to the community right now is the question of the exact syntax for const generics. We hope that this thread can be a good space for have that conversation.

Once we have the syntax figured out, weā€™re going to try to draft an RFC which includes that syntax and addresses the equality issues discussed here. This may be a new RFC or it may be an amendment to 1931 (the main reason to make it a new RFC is that weā€™ve had trouble trying to edit significant additions into existing RFCs in the past).

Equality of consts, unification & overlap

A major issue (indeed, probably the most major issue), is determining the equality of two constant expressions during typechecking. This is needed in two different ways:

  • Unification: Unification is the process of determining that two types are equivalent. In order to unify two types which have a const expression, we need to know: are the values of those consts equal?
  • Overlap: The overlapping impls checks needs to do essentially the inverse of unification. In order to guarantee that no two impls overlap, we have to ensure that the types of the input types to those trait impls cannot be unified. If those types have const expressions, we need to know: are the values of those expressions never equal?

These issues have been touched on in the comment thread for 1931, so if youā€™ve been following that closely this section probably doesnā€™t have too much new information for you.

Abstract expressions

The first big issue has to do with constant expressions in which one member is an abstract constant parameter. Consider these pairs of types: Do they unify? Do they overlap?

for<N: usize> [i32; {N + 1}];
for<N: usize> [i32; {N + 2}];
for<N: usize> [i32; {N + 1}];
for<N: usize> [i32; {1 + N}];
for<N: usize> [i32; {N + 1}];
for<N: usize> [i32; {N * 2}];

With these examples, you can probably determine the answer by analyzing the expressions, but whatever solution we arrive at needs to be general.

As a first pass, we propose a solution which treats any expression involving a const parameter other than the identity expression as an opaque function. This has a convenient analogy in associated types:

// Do these unify? Do they overlap?
for<T: IntoIterator + Deref> Vec<T::Item>
for<T: IntoIterator + Deref> Vec<T::Target>

The answer here is that these donā€™t unify and they do overlap, because we need to be conservative. Its possible (though unlikely) that T::Target == T::Item, and its possible (very likely) that T::Target != T::Item.

At least at first, we would treat all const expressions containing parameters the same way, except for the identity expression (e.g. N). Syntactically we may distinguish these expressions with braces, as I have done in my example.

This would mean, for example, that [T; {N + 1}] and [T; {1 + N}] would not unify, but would overlap. Its plausible that at a later stage we will develop more sophisticated rules that are aware of the commutatitivity of integer addition and similar well-understand properties, so that simple cases like this will ā€œjust workā€ how you expect them to.

Disallowing unconstrained const parameters

Consider this impl:

impl<const N: usize> Foo for [i32; {N * 2}] { }

A consequence of treating {N * 2} as an opaque function is that given eg. <[i32; 4] as Foo> we cannot resolve the correct impl, because we cannot determine the inputs of the function from its outputs. For this reason we will have a similar rule to the ā€˜unconstrained type parameterā€™ rule: every parameter must appear in the impl in an identity expression.

Structural equality, floats, etc

The other issue unification brings up is the exact definition of equality. In particular, it is necessary that this equality be reflexive; otherwise we cannot guarantee that T == T. Importantly, float equality is not reflexive, because NAN != NAN. This would mean, for example, that Foo<NAN> != Foo<NAN>.

For now, it seems the most conservative solution is to limit const params to types which have the structural_match attribute (as stated in RFC 1931). This means, for example, that you canā€™t have a floating point const param. Someday we may extend what types can be used (for example, by implementing equality of floats as bitwise equality), but that is a future extension.

This also means that a const generic cannot be of a generic type, because we have no way to constrain that type to implement structural equality (since it is an attribute and not a trait).

Syntax

Now, to the syntax. This has several parts:

  • How should a declaration of a const parameter look? (eg impl<$CONSTANT>) This needs to contain both the name and type of the constant.
  • How should a use of a const parameter look? (e.g. impl... for Foo<$CONSTANT>) This need only contain the name, not the type.
  • How should a const expression look? (Should it be enclosed in curly braces or not?)
  • Should we separate the const params from the other params with any special character (eg ;) or just a comma?
  • Stylistically, should const params be snake_case or SHOUTING_SNAKE_CASE, and should we lint about it?

To start the conversation, Iā€™ll give my own opinion (not discussed at the lang team meeting):

@withoutboatā€™s const generics syntax

Declaring a const param:

impl<T; const N: usize>

When declaring a const param, the const keyword is used, followed by the type. The use of const is to sharply visually distinguish the consts from the type parameters. A semicolon also separates the type parameters from the consts to be consistent with use sites.

Using a const param:

Foo<T; N>

When using a const parameter, just the name is used, no const keyword or type. A semicolon separates const params from type params, so that you can determine from the use site that N is a const, not a type.

The semicolon is also used to make this syntactically consistent with the construction of arrays: [T; N].

Const expressions:

Const expressions containing constant params are contained within braces, e.g. [T; {N * foobar(N, 2)}]. The only exception is the identity expression: N. This has several advantages:

  • This visually distinguishes those constant expressions which have to be dealt with specially by the compiler from those which do not.
  • This makes it easier to parse arbitrary expressions (e.g. comparisons: Foo<{N > 7}>).

Expressions containing no params can also be wrapped in braces, but it is not necessary except for parsing reasons (e.g. Foo<6 > 7>).

Casing:

Like all constants, constant parameters should by in SHOUTING_SNAKE_CASE, and any other case will be linted against just the same as other constants.

37 Likes

Are you suggesting that [T; N * foobar(N, 2)] is... an error?

To give some background, the use of braces came up as a solution to the parsing problem: e.g. Foo<CONST_A + CONST_B> is very annoying and complicated to parse correctly. You end up needing what is sometimes called a "cover grammar", where you have "type or expression" in your grammar which parses all valid types and all valid expressions, and some syntax overlaps in a way that requires name resolution output (or worse, type resolution for T::Assoc) to tell between types and expressions. But we can side-step that requirement for just literals and single identifiers, where {1} and {N} would be too jarring.

We came up with {expr} to avoid the whole question, but that makes it unnecessary and irrelevant for [T; expr]. The unification implications of {expr} were just a cute hack piggybacking on the fact that expressions require braces, but referring to a const parameter doesn't, so that can be reused to distinguish between "potentially unknowable (before instantiation) expressions" and "definitely a const generic parameter" quite easily.

But everything I've just said was in the context of not using semicolons to separate const generics from everything else. If you have ; there's no point in depending on braces, just as there's no point in writing associated types in a certain way or adding unnecessary parens around types.

1 Like

Yea this wasn't a good example, it was meant to be suggestive of how arbitrarily complex expressions are, but it would parse today.

My initial position was the that would be an error if N is a param, but it might be the case that my opinion changed mid-post. I really don't know if we should do that 'piggybacking.' Given that arbitrary expressions involving no params might be in semicolons, it doesn't seem like it actually means anything.

What if it only has const params? Would you have to write Foo<; N > 7>? I don't care for that.

I want the semicolon just for consistency with arrays and distinguishing the consts from the types (lifetimes and assoc type specifiers are already syntactically distinguished). I think they're strictly unnecessary with the rest of the syntax I want but I like them for clarity.

1 Like

I get the array thing, but I personally donā€™t see the value in it. Especially if, say, someone might want to interleave parameter kinds, like so:

pub struct SmallMat2<E,
                     const N: usize,
                     const M: usize,
                     A: Allocator = DefaultAllocator> {
    data: SmallVec<E, {N * M}, A>
}
7 Likes

My current opinion on syntax:

  • Drop the ;. Just let the ā€œfixed-length arrayā€ syntax be a special-case (it is anyhow).
  • Declare constant parameters using const N: Type: Foo<const C: usize>.
  • When providing a value for a constant parameter, the identity expression Foo<C> or Foo<22> is special-cased. Otherwise, use braces Foo<{C * 2}>.

This is not consistent with the [T; N] array syntax. Oh well. Arrays are special-case anyhow. Moreover, requiring semi-colons doesnā€™t work well in general. Consider Foo<const C: usize>ā€¦how would a semi-colon work there? Do you really want to type Foo<; 3>?

19 Likes

I need to be clearer. First:

  • I donā€™t want to use the semicolon if there are no type parameters; I firmly donā€™t want Foo<; 3>
  • consistency with arrays is not primarily the reason I want to include the semicolon

The reason I like the semicolon is that if you see Foo<T, N>, you canā€™t tell from the use site if T or N are types or consts. Today, lifetimes are distinguished at the use site through the ' syntax. The semicolon helps make that distinction.

Of course if it only contains types or only contains consts, you wonā€™t be able to tell them apart, but I guess I feel okay with that. :sweat_smile: I think this preference for the semicolon is a heuristic thing - I think it will be easier to quickly move through code if thereā€™s a ā€˜markerā€™ in long param lists to separate the types and consts, & I think it will be natural to write.

@eddyb We donā€™t allow you interleave type and lifetime params, it seems odd to allow you to interleave consts. And it exacerbates the concern I attempted to explain in this post.

1 Like

I've generally considered this a bug, not a feature... but I've also wondered if it is useful in its own way. Same with the 'foo syntax (in place of just foo or Foo).

2 Likes

I do think its useful. My syntax highlighter highlights lifetimes differently, for example. Its valuable to be able to distinguish kinds from one another at a glance when trying to read complex bounds.

I wish there were a ' like solution for consts (since the ; solution is only a partial solution), but it seems hard to do that when non-param consts are already established as just shouty snake case & most params will be one character long. If anyone knows a solution which makes consts always distinguishable from types Iā€™d be glad to hear it. :wink:

(I guess this could be a reason to make them non shouty snake caseā€¦)

But other than this I think our syntactic preferences are completely aligned.

2 Likes

I have the same preferences. I get the motivation behind the semicolon, but I think it is more important to treat all kinds as uniformly as possible and not introduce arbitrary per-kind special cases. This is just a design sense, but irregularity tends to breed irregularity, and one bit of unnecessary complexity can get caught on other things and end up compounding down the road. (For the record, I would also have preferred if lifetime parameters were introduced as lifetime a and used as just a, in the very same way, without the ticks.)

Another possibility for expressions-in-types syntax is Foo<const C * 2>. This mirrors the parameter-declaration site better, is more self-describing, and is less "syntaxy". Braces are probably still better though.

This makes me uneasy. Can't we make it a trait? In general I feel that non-macro attributes should be "only metadata" and not have any semantic in-language impact, because otherwise deciding which features should have "in-language syntax" and which ones should have "attribute syntax" becomes kind of arbitrary. (Yes I know there are already exceptions to this and I don't like them either, but this would be particularly egregious.) If we start out with the attribute could we backwards-compatibly change it to a trait later on?

If we wanted to be cute, we could call the trait regulating whether a type can be used for a const parameter... const. (: So T: const, etc.

Incidentally, don't we already allow associated constants of types like f64? Don't they potentially run into some of the same issues?

5 Likes

Foo<T, {N}> is quite good at making types and consts visually distinct. All we need to do is to not introduce the special case for generic arguments having form IDENT. :smile:

6 Likes

The attribute was intended (in the context of matches) to be the most conservative thing, and never to be stable. Structural match is a holding pattern already; someday later a more permanent proposal will be arrived at which will have to account for both match and const params.

AFAIK no there's no issue with associated consts. Since you can't have a type parameterized by f64 you can never do something like Foo<T::FLOAT>.

1 Like

You guys considered a trait bound like syntax for const generics already?

struct Foo<T,N> where N: const u64 { ... }
impl<T, N: const usize> Foo<T,N> { ... }

I suppose issues arise because N is a const not a type probably.

You might want not just compile time constants in these slots, but eventually immutable values, ala the discussions around alloca and unsized rvalues. That does not impact the discussion of withoutboatsā€™ ; right?

1 Like

I think the restriction really has to do with "input" types, right? Still, it seems like eventually we are going to want to permit floating point types (and indeed arbitrary types) as the values of constants. It also seems quite natural to me that we would want a very strict notion of equality when it comes to instantiating impls and so forth.

Note that we can be extra conservative around floating point values, in order to sidestep annoying questions like whether two distinct NaN bitpatterns are equal. In particular, we could require that two floating point values are only considered "equal" if they derive from the same constant.

Actually, this is an interesting question. I'm assuming that if you had

const C: u32 = 0;
const D: u32 = 0;

we would want [T; C] and [T; D] to be the same type, right? This isn't entirely obvious (since you might not want your code to stop working when D changes value, for example), but I think it's already true anyhow so we can't really avoid it.

I guess that puts us in a bit of a bind in that one might expect the same behavior with floating point constants (e.g., Foo<C> and Foo<D> are the same type, where C and D have type f32 instead).

Well, maybe it's trickier than I thought! Still, I imagine we are going to have to cross this bridge at some point. Personally I think I would be inclined to say that two floating point values are considered equal for our purposes if they are bitwise equal or both are NaN (I'm not a floating point expert, but I think that values like 0 and -0 have a unique bitpattern, right?).

I think we can punt on this question because the answer is the same as the answer for matching against constants, is it not? That is, we have to decide what to do about floats already, and weā€™ll just do the same thing there for constant parameters.

Yes: 0000 0000 0000 0000 = 0 8000 0000 0000 0000 = ā€“0

As to treating NaNs as equal at compile time, that's the opposite of their behaviour at runtime, which might be surprising behaviour? On the other hand NaN as a constant time parameter is not likely to see much use, I would think.

While I agree with allowing arbitrary types to become values of constants (in fact that would be really awesome!), I think we should require things like Eq to be implemented.

I think floats are far too shaky (inspiration) to be used as const type params. Rust should have a strong type system, where types are unified by applying deltas.

Either way, I don't want to make this discussion now, right now I would prefer to talk about a very basic version where only integers are supported. We can determine later what to do about other types like floats.

About ; vs ,, the ability to abandon {} which is more of a hack is appaling, but mixing , and ; would not look great imo, and it would also make it possible to confuse ; for some higher level grouping and , for some lower level, like it is in C/C++ (most times its used inside for loops).

Generally, I would really love it if this feature could arrive on stable as early as possible. End of this year is nice, but it would be nicer if it were already stable by that time.

2 Likes

That makes sense, and of course feels familiar from lifetimes, but somehow it doesn't strike me as so important in this case.

For one, thinking in terms of the "reasoning footprint", use sites mentioning a type/const variable tend to be close to where those variables were introduced (e.g., struct or impl headers, or on the same line for a fn signature). So you don't have to hold too much in your head. Also, I find that once a struct takes more than a couple of type arguments, I need to refer to its definition regardless any time I am futzing around with some application. YMMV.

1 Like

We could probably do something like that backwards-compatibly, and the introduction of const parameters would provide a good impetus for doing so. (We'd then also want to allow type for being explicit about type parameters, and just say that leaving off the qualifier defaults to type, and ' implies lifetime.)

In particular, as we continue to improve elision so that you need explicit lifetimes less often (the lang team has some early thoughts about this), having the explicit syntax become more verbose is not such a big deal.

7 Likes

It doesn't have to be. I would draw a distinction, since the comparison in one case is happening at compile time (generics) and, in other cases, at runtime (matching).

Ultimately, I agree with this. No reason to rush into anything.

1 Like