A question from @oli-obk got me to thinking that it is time to have some discussion about how to best handle constants. I wanted to get this started by laying out some ideas.
The state today
Today’s constant infrastructure seems to me to be fraught with problems. Among them:
- We have two constant evaluators, the one from typeck and the one from trans.
- The trans evaluator is more complete, but is based on LLVM, which is sort of non-portable and also just opaque.
- Many places that demand constants, like array types, actually bake
in a
usize
, under the assumption that constants can be reduced at type-checking time to an integer. This is not compatible with associated constants, where we might have to treat a constant a bit more abstractly. -
const fn
is, afaik, only supported in one evaluator.
At a high-level, I’d like to see us:
- Have a single constant evaluator that is independent of LLVM.
- Have a notion of constant equality that doesn’t require reducing to an actual value. This seems very, very similar to associated type equality and normalization – the primary difference is that we could choose to bake in some understanding of mathematical identities.
- Permit arbitrary constants in array lengths; if we want to know whether two types are equal, we would base that off of the equality relation described in the previous bullet.
- Support
const fn
uniformly and smoothly.
Two forks in the road
So as part of the MIR branch I’ve thought this over some. I primarily considered two big approaches:
- MIR-FTW: In this version, a constant value is represented as a MIR fn that would do the computation, along with a “tail expression”.
- AST-FTW: in this version, a constant value is represented as a kind of simplified AST. This probably winds up feeling something like a little calculus, since we can do various reductions to simplify the AST:
-
Actual math (of course):
2+2
could be replaced with3
. Er,4
. -
Fn calls:
foo(2, 3)
could be replaced with the definition offoo
. -
Projections:
i32::SIZE
might be replaced with4
(or other associated constants). -
Identities: we might do something like changing
X+0
toX
, normalizing the overall form, etc. This is needed only to help the equality relation be a bit more comprehensive. Honestly I am reluctant to go very far in this direction, since it seems like a rathole, but perhaps handling a few cases would be helpful.
The AST-FTW approach feels a bit more natural to me, if a bit less DRY
(after all, the rvalue type and constant AST have some amount of
overlap). Therefore, that is what I wound up doing. The MIR branch
today includes a type Constant<H>
which defines the mini-AST
(below). As you can see, it’s really not that big. If we wanted, we
could probably consolidate this with Rvalue<H>
by making one type
RvalueTree<H,L>
that is parameterized by a type L
representing the
arguments. Hence today’s Rvalue<H>
would be RvalueTree<H,Lvalue>
and Constant<H>
would be RvalueTree<H,Constant<H>>
. But I’m not
sure if it’s important.
pub enum ConstantKind<H:HIR> {
Literal(Literal<H>),
Aggregate(AggregateKind<H>, Vec<Constant<H>>),
Call(Box<Constant<H>>, Vec<Constant<H>>),
Cast(Box<Constant<H>>, H::Ty),
Repeat(Box<Constant<H>>, Box<Constant<H>>),
Ref(BorrowKind, Box<Constant<H>>),
BinaryOp(BinOp, Box<Constant<H>>, Box<Constant<H>>),
UnaryOp(UnOp, Box<Constant<H>>),
Projection(Box<ConstantProjection<H>>)
}
pub type ConstantProjection<H> = Projection<H,Constant<H>>;
pub enum Literal<H:HIR> {
Item { def_id: H::DefId, substs: H::Substs },
Projection { projection: H::Projection },
Int { bits: IntegralBits, value: i64 },
Uint { bits: IntegralBits, value: u64 },
Float { bits: FloatBits, value: f64 },
Char { c: char },
Bool { value: bool },
Bytes { value: H::Bytes },
String { value: H::InternedString },
}
pub enum IntegralBits {
B8, B16, B32, B64, BSize
}
pub enum FloatBits {
F32, F64
}
When constants are converted
If we have the option of embedding references to constants into types and not eagerly evaluating them, then we can easily translate types in the collect phase without forcing constant evaluation to occur immediately. We then defer the question of whether two types are equal to the type when we are comparing them. I see this as very similar to the lazy normalization that I described in an earlier discuss thread (and which @jroesch has been working on making a reality). Basically when we find two constants that must be equal, we can try to normalize them then – this may result in an actual integer or float, but it may also yield something like Foo::SIZE
(which presumably can’t be evaluated until trans). We can clearly still conclude that Foo::SIZE
is equal to Foo::SIZE
, and we may be able to integer where clauses (eventually) to conclude that X::Y
and Foo::BAR
are equal as well.
Where constants are used
I envision that just as every fn
is lowered to a MIR
representation in terms of basic blocks and so forth, every const
would be lowered to a MIR
representation in terms of Constant
.
When we do constant evaluation, that is basically applying the
reductions described earlier. Constants that reference one another
would be initially encoded just as a Item
reference, and could be
inlined as part of reduction (we’ll have to detect cycles, obviously).
Constant propoagation on the MIR
I imagine we will implementation constant propagation as a pass that
walks over the MIR and tracks which values have known
values. (Naturally it will have to be careful about overflow.) This
can make the code faster of course (or at least faster to compile) but
it can also, in some cases effect the type system. This is because
taking some constants (like []
) can be promoted to static memory and
hence have their lifetime extended beyond the usual temporary lifetime
bounds. Right now we only do this for []
, but we’ve talked about
extending it to other cases, like closures with no free variables.
(Overflow introduces an interesting wrinkle, since something like
2+2
could be eligible, but INT_MAX+1
presumably not? But that’s a
problem for the RFC to tackle.)
The fate of check_const
Today we have this pass check_const
that walks the AST and
identifies what can be constant and note. I see this going away. It
lives on, to some extent, as the constant propagation pass I just
described, and of course also as a part of the MIR lowering process,
which will check that constant expressions fall into the correct
subset.