[Pre-RFC] Integer/Float literal types


#1

This is a slightly more formal version of idea sketched here. cc @scottmcm


Summary

This RFC introduces two new “types”, i__ and f__, the names of which capture the idea of “int/float, but with unspecified width.” (Mind, I don’t like these names, but ilit and flit are worse, imo. Please bikeshed better names.) These types represent an untyped integer/float literal, such as 0, -42, and 1.3e-4. Currently, these expressions do not have a nameable type, which is expressed in error messages as {integer} and {float} (mind, I don’t know enough about compiler internals to know if this is the case, but this seems like the most reasonable implementaton). For example:

error[E0308]: mismatched types
 --> src/main.rs:3:17
  |
3 |     let _: () = 0;
  |                 ^ expected (), found integral variable
  |
  = note: expected type `()`
             found type `{integer}`

This RFC proposes to give these types a similar status to ! in stable: they can only be used in the type ascription of a const binding:

const MASK: i__ = 0b0101_0101;
const PI: f__ = 3.14159265358979323846264338327950288419716939937510;
// forbidden:
const NONE: Option<i__> = None; 
type IntLit = i__;
impl Sync for i__ {}

This type has no methods defined and implements no traits for now; instead, it coerces to sized integers and floats, exactly like a literal does, at usage sites. Imagine the following behavior:

use std::mem::size_of_val;

const MASK: ilit = 0b0101_1010;
let foo = MASK; // type is infered as i32
assert_eq!(size_of_val(&foo), 4);

// desugaring
macro_rules! MASK { () => {0b0101_1010} }
let foo = MASK!(); // type is infered as i32
assert_eq!(size_of_val(&foo), 4);

Motivation

Numeric compile-time constants in other some low-level languages are always untyped:

#define MASK 0b01010101 /* C */
const kMask = 0b01010101 // Go

In fact, such behavior is currently achievable in Rust with macros, as described in the above desugaring. However, this is an ugly and unergonomic solution. This proposal provides a way to opt into this behavior.

In general untyped numeric literals are a bad idea, since it can confuse the typechecker, causing it to infer exciting, unexpected things, like calling transmute without a turbofish. However, it does make some code less painful to write, and is the natural type for bit masks, which casting is line noise more than anything else (this neatly solves some of the problems my above post describes).

If, in the future, we support using this type as a parameter in a const fn, it would be the natural type for implementing custom literals, like our friend operator "" foo from C++. While this is beyond the scope of this RFC, I like to imagine traits like

// mod core::ops
trait FromInt {
    const fn from_int(literal: i__) -> Self;
}
trait FromFloat {
    const fn from_float(literal: f__) -> Self;
}

I could also imagine allowing struct Foo<const N: i__> { .. } as a natural extension of the current syntax, with the same coercion behavior.

Drawbacks

I don’t hack on rustc, so I’m not sure how much messing about with the typeck will need to be done to move the literal coersion rules to a nameable type. This also opens us up to the exciting bug that arise in C and Go from having typeless constants, though they would be opt-in, and users would be encouraged to use them sparingly. Custom literals are a questionable feature which, to my knowledge, has never been discussed.

Prior Art

Scala’s dotty compiler has explicit literal types: the type of 1 is 1.type, which is a subtype of Int (corresponding to the JVM int type). In addition, String literals also have types: "foo".type, but this is beyond the scope of this proposal. These types are mostly intended to be used in generics. I don’t know of any language that uses a single type for all int/float literals, but I haven’t done any research.

As pointed out, many languages have untyped constants, but this is often opt-out, of opt-out-able at all. I think my proposed opt-in mechanism for untyped constants is not the enormous footgun typeless-by-default is.

Some languages have custom literals, but custom literals are beyond the scope of this proposal outside of future extension.

Unresolved Questions

  • Should we decide on a representation for these types, or defer it for later? Right now I imagine the compiler should internally represent them as arbitrary-size numbers in your favorite scheme.
  • Should we consider a more granular approach, like Scala’s?
  • What should such constants look like in FFI? How should they appear in compiled traits?

Note: this is my first RFC. Let me know if I’ve done anything I should improve! I’ve omitted some sections that I think should be filled out by whatever discussion happens here.


#2

I’m not sure how rustc handles constants, because IIRC, Go just substitutes the constant value into wherever it appears in the code. From what I can tell, Rust does handle this differently, requiring every top level value to have a type.


#3

In fact! Rust does this too: a const binding’s value is always inlined at compile-time. The fact that they’re typed isn’t a problem, unless we want literal inference to happen at the call-site, rather than at the definition-site. In that case, you’d use one of these literal types.


#4

This looks cool!

I think this is the trickiest part to nail down. I’d like, for example, PI * PI to produce another f___ so tha consts can all just do the right thing and we never need discussions like https://github.com/rust-lang/rust/pull/48622 for derived constants.

But any such “when do we switch to runtime types” has observable behaviour, so would need to be nailed down early, or at least narrowed to allow doing so later. And those rules seem hard to define…


#5

I actually explicitly don’t want this… this sort of turns f__ into some kind of built-in bignum, and seems to give the impression that this code just politely asks the FPU to politely do the needful… when it’s really secretly going to run some lang item function or whatever. PI * PI should type as f64, but

let PI_SQUARED: f32 = PI * PI;

should be allowed, too. If in the future we have implicit coercion from literals (not a fan of this, after all of my Scala experience) you might instead write let PI_SQ: BigNum = PI * PI; for let PI_SQ: BigNum = BigNum::from_float(PI) * BigNum::from_float(PI);.

Edit: wait, you’re saying that PI * PI should be const-evaluated… hmm, I’m not sure how we’d do that cleanly, unless we introduce something like

trait ConstMul<RHS> {
      type Output;
      const fn const_mul(self, rhs: RHS) -> Self::Output;
}
impl<T,U> Mul<U> for T where T: ConstMul<U> { /* the obvious */ }

and somehow magically implement it for f__ which is !Sized?


#6

Let me elaborate a particular use I have in mind, and maybe you can find a way that’s different from what I said in my last post :slightly_smiling_face:

One of my goals for const generics is to have something like Integer<N, M>, where Integer<A, B> + Integer<C, D> => Integer<A+C, B+D>. But what type should that const parameter have? A compile-time-only “built-in bignum” would be perfect, since neither i128 nor u128 are right.

(As for PI * PI and traits, today 2 * 2 is const-evaluatable even though there’s no ConstMul, so I’m not too concerned. And further down the road I expect we’ll get something like impl const Mul for Foo to avoid multiplying out all the trait hierarchies.)


#7

Ok, that makes much more sense! I totally misread your first post, thinking that PI * PI was in expr position.

Yeah, I think a compile-time bignum is a good thing to have. After thinking it over a bit, PI * PI typing as f__ is fine, because the * there is a magic builtin and not the Mul trait. I also think that having a u__ would be good too for a generic context.

I also thought a bit about whether making i__ and friends true DSTs. For i__ it’s easy: &i__ is just a &[u8] whose contents are the value in the platform’s endianness. I’m not sure what &f__ should be though… a triple pointer with lengths for the exponent and mantissa? Also, would things like

fn to_u128(ptr: &u__) -> u128 { *ptr }
// desugars to 
fn to_u128(ptr: &u__) -> u128 { unsafe {
    // if ptr.len() is too short we'd mask off the high bits but 
    // you get the point
    transmute::<*const u8, *const u128>((ptr as &[u8]).as_ptr())
} }

be kosher?


#8

I was picturing that as just fn foo<const N: i__>() where N >= 0, since even for signed things I suspect bounds like -128 <= N && N < 128 to be common. But it certainly wouldn’t hurt, and might even help for things like 0_u__ - 1 + 2 that would “work” in i__.

Actually being DSTs kinda scares me, since that means unsized rvalues or Box<i__> all over the place. I was thinking Vec-like so they were Sized.

Maybe just an i__ mantissa and exponent? So 1.001 is returned as 1001e-3.

Depends if compile-time irrationals are needed. One more-than-slightly-crazy option is an expression template, so that it could compute the value of the expression on-demand to the precision needed for the target type, rather than having to pick one ahead of time :stuck_out_tongue_closed_eyes:


#9

Golang just uses big rationals for f__ and it seems to work out fine. I’d prefer that to having to deal with any kind of rounding issues.


#10

So imagine something like

// quadruple pointer o_o
struct &'a f__ {
    num: &'a i__,
    den: &'a i__,
}

let ptr: &f__ = &1.0;
let x: f64 = *ptr;
// desugar
let x: f64 = *ptr.num as f64 / *ptr.den as f64;

maybe? The runtime fdiv is gross though. Using an IEEE-like repr means that dereferencing is just shifts and masks. And, to be honest, if you’re using floating point numbers and get bitten by rounding issues to the point that you notice, that’s probably your fault for not using fixed-point IMO. (See: finance firms that unironically use floats for monetary amounts. The horror.)


#11

As I mentioned, it depends if you need 2.0.sqrt() at compile-time (or sin or …).

There are things one can do, they’re just uncommon, even if they’re nearly a half-century old now: http://home.pipeline.com/~hbaker1/hakmem/cf.html#item101b