[MIR] constant evaluation

It’s just a matter of writing a program that generates source code and this is already used in build.rs scripts in the Rust ecosystem (and those can lead to the exact same integration problems).

I expect build.rs to be treated as a hack and tightly controlled, with integrators not being afraid of significantly modifying it. OTOH, having host-runnable code tightly mixed with Rust code would be much more of a problem, especially if it gets to be idiomatic.

I have no problem with it, however it means that even in the “full VM” case (or smart macros, or whatever) there is still a restriction that the user will have to learn and thus only a subset of data types will be usable (especially as those non-usable data-types are viral: they contaminate any data-types that embed them).

Thus, in any of the solution discussed, only a subset of the data-types functions is available at compile-time; the only difference is how many.

Unfortunately, this is quite error-prone (compared to calling the function) and I am afraid for forward compatibility. It means that the macro is tied to the current version of the compiler: add a new CPU/OS/… that specialize a data-type the macro used to generate, and this macro cannot be used to cross-compile to that new target.

By contrast, the set of const functions is backward compatible, and when a new platform is added that requires some to be specialized, they are. Thus, for the developers, const functions will present a more stable interface.

I think you still do not understand me. to recap:

  1. There is no VM.
  2. There is no restriction on types. a macro can use threads in its implementation. Simply, the macro has a separate lifetime. It is exactly equivalent lifetime rules that prevent you from returning a reference to a local variable defined in a function.
  3. There is no more tight coupling with macros than there are with const fn or any other regular old runtime code that is platform dependent. The std contains plenty of exactly this. I’d argue that the abstraction layer for platform dependent stuff should mostly be the Rust std and macros should use the rust std provided abstractions. Otherwise it is a design smell. In your example, what is the justification to use libc::tm instead of a rust type in std?

I don’t really have a horse in this race, let alone thought very deeply about the topic, but I have to say I don’t see how @yigal100’s scheme solves the portability problems. I mean, it sounds like a nicer model for the programmer than a restricted sub-language, but I see nothing that makes portability easier. Let’s take a concrete example to reduce the risk of talking past each other:

Let’s suppose I have a command line tool written in Rust and it accesses a couple of files on startup and for whatever reason the logic for where these files are is slightly complicated so I decide to put it in a constant expression. I cross-compile this application for a Raspberry Pi running Debian, and I happen to like VisualRust so I cross-compile from Windows. Thus what I need to end up with is constant like this:

const INIT_PATHS: Vec<PathBuf> = ...;

Now, this is a deliberately tricky constant, since it is composed of a vector (i.e., a memory allocation) and paths (whose representation is platform-dependent). Note however that this is a perfectly idiomatic type one would use in a function like get_init_paths(), and people will want to use such types, and thus we either need to handle them or reject them gracefully. Simply running said functions and pasting some representation of its result into the source code will not do. Some problems (few if any are specific to the multi-staged compilation thingy):

  • @yigal100 has previously suggested container literals. However, since there is no life before main, the .data segment of the final executable has to somehow end up containing a pointer and two integers, and the pointer has to point to memory that contains the elements. I’d be interested to hear how this should work. Whatever it is, it needs to work for arbitrary user-defined types since Vec is just that.
  • As for the elements, how do we represent them? The internal representation of PathBuf is platform-dependent even though it’s a pure Rust type. Furthermore, on Windows as well as Unix ultimately boils down to a Vec<u8> but with several layers of completely different types in between. Same issues as with Vec, with the added fun that we actually need to fill in a value of a different type. The two types have different invariants: Windows-PathBuf may contain lone surrogate code points but is otherwise UTF-8, Unix-PathBuf can contain arbitrary non-0 bytes. Simply transmuting the representation of one to the other is plainly incorrect, even if this is unlikely to cause many problems in practice.
  • Note that it’s not enough to write out the expression PathBuf::new(<the host's PathBuf as string>). PathBuf::new would require computation, not to mention allocation, on the target machine. Also there’s no life before main, yadda yadda.

In the light of these problems, it’s tempting to say: “Computing paths for a different platform at compile time is absurd! Just don’t do it!” I am actually inclined to agree, but in that case I would like to hear how the compiler would know that this type is OK and that one isn’t. It’s not as simple as looking for pointers or #[repr(C)] annotations, since some perfectly platform-independent types are repr(C) (e.g., anything which is primarily a bunch of fixed-size integers) or contain pointers (case in point: Vec, which we all want to see supported).

@rkruppe: I think that nothing short of life-before-main can work for your example because of that allocation inside of PathBuf::new(). But @yigal100’s proposal would work in many other scenarios especially in combination with restricted const functions (the latter would be required to initialize types with private members).

@all: BTW, speaking of life-before-main: would it still be bad if such functions were banned from accessing mutable global state (including immutable types with interior mutability, such as Mutex-es) - to prevent ordering dependencies?

I’m sorry for being late to this discussion, given that @nikomatsakis and I have gone over this in slightly more detail that is available here.

I mostly agree with @matthieum in that allocations are more useful than full-blown IO, but there is a deeper issue: purity.

To be able to use const fn with the typesystem (i.e. [T; ctfe()]), const fn functions must be pure, in the sense that repeated calls with identical arguments must produce identical results.

While it may be possible to design a system where impurity cannot affect type safety, it’s a non-goal for Rust 1.x.

As for claims of complexity, a MIR interpreter capable of evaluating everything but impure/IO code should be simpler than the MIR -> LLVM IR translator and a LLVM JIT/interpreter, and cleaner than the current constant folder (which has been getting better, thanks to @ker).

@scott has started work on an out-of-tree prototype named miri, which reassures me that MIR is great for tooling and that we can do a lot with an order of magnitude less code than, say, the name resolution pass.

Without further ado:

Rust/CTFE Abstract Machine

A proper subset of the Rust Abstract Machine, restricted to deterministic reproducible evaluation.

That is, only const fn functions may be called, which are required to always return structurally equivalent values and produce the same writes to memory reachable by arguments, given structurally equivalent arguments.

C functions are never const fn, as the above property (also referred to as purity) cannot be enforced, although a C-to-Rust transpiler could fill in that gap.

Writes to static (global) memory and inline assembly are also prohibited, and compile-time pointers do not have observable addresses.

The CTFE capabilities offered are similar to those found in C++14 constexpr.

Data layout

The machine works with octets as its fundamental unit of measurement (for sizes, alignments and offsets), which will henceforth be referred to as bytes.
It’s parametrized by the target architecture’s data layout, which consists of:

  • endianness (the order of bytes in a multi-byte integer value)
  • size of pointers and usize/isize
  • alignments of all primitive types

Based on the machine data layout, the layout (size, alignement and field offsets) of aggregate types can be determined.
The layout of non-#[repr(C)] aggregates is implementation-dependent.

Primitive types

Boolean, C-like enum, character codepoint, integer and floating-point values will be represented as exactly one valid value in their respective representable range.

Addresses (pointers, references, all non-local lvalues) will be one of:

  • integer addresses: x: usize in (x as *T).
    Cannot be read/written through, but other operations do work (on the integer value), e.g.:
&(*(0 as *const Struct)).field as *const _ as usize
  • abstract locations: (region, x: usize) in region.offset(x) with x < region.size.
    Can be read/written through, except for writes to static (global) memory.
    Can only be compared with another abstract location from the same abstract memory region (the offsets are compared).
    No bits of a pointer holding an abstract location may be observed as any other type than a pointer, for which the whole value must be used.

Abstract memory region

A contiguous block of memory with a fixed size, consisting of size constant bytes and a set of non-overlapping special values (undef bytes and relocations).

Undef bytes can be found in uninitialized memory (e.g. mem::uninitialized()) and are also produced by writing an aggregate, at positions not covered by leaf fields (i.e. padding bytes).

Reading undef bytes is prohibited, as code that does this is likely to exhibit undefined behavior at runtime.
Aggregate reads side-step this by only reading bytes covered by leaf fields.

Primitives are read/written at the byte level, and the combined bit pattern must represent a valid value (transmute is no different).

For non-pointer primitives, none of the bytes read may overlap with a relocation.
For pointers, either:

  • None of the bytes read overlap with a relocation, producing an integer address.
  • The entire pointer aligns perfectly with one relocation, producing an abstract location, where the constant bytes gives the offset, and the relocation gives the abstract memory region it’s pointing into.

Writing over a relocation is only permitted if the entire relocation is written over in the same operation, replacing it (e.g. a 32-bit pointer relocation can be written over with *mut T, u32 or (u16, u16) but not two u16 writes).

To give an example, the following two abstract memory regions would be produced to hold &[(0u8, 1u16), (2, 3)][1..], for a little-endian target with 32-bit pointers:

Region A:
Data:     04  00  00  00  01  00  00  00
Special:  └──Reloc(B)──┘

Region B:
Data:     00  00  01  00  02  00  03  00
Special:     Undef           Undef

Dynamic memory allocation (optional)

An implementation can provide intrinsics to allocate new abstract memory regions, deallocate them (invalidating dangling references and preventing further uses) and resize them (which can always be done “in place”).

Special care must be taken to ensure that values of types that administer such memory (Box, Vec, String, etc.) do not escape into runtime code while holding a pointer to an abstract memory region.

The implementation can generate runtime allocations from abstract memory regions when a const item “owns” the allocation, although this alone is not enough to get e.g. a &'static str from String CTFE.

EDIT: discussion resulted in the addition of the paragraphs below:

All abstract memory regions reachable from a &'static T can be placed in static memory, with the same restrictions as borrowing in a const item (no UnsafeCell), e.g.:

const WORKS: &'static str = &5.to_string();
const WORKS2: &'static str =
    Box::leak(5.to_string().into_boxed_str());

const FAILS_OLD: &'static AtomicBool =
    &AtomicBool::new(false);
// Arc uses 
const FAILS: &'static Arc<i32> = &Arc::new(0);

No runtime allocations shall be used, only read-only static allocations. static items could also be extended to allow thread-safe shared data, e.g.:

static FOO: Arc<Foo> = Arc::new(Foo {...});

One could only increment and decrement the refcount of FOO, but it would never go below 1, so the allocation being in .data would not be an issue.

The UnsafeCell is more subtle, though, as the following also needs to be forbidden:

static GOOD: Arc<Mutex<String>> =
    Arc::new(Mutex::new(String::new()));
static BAD: Arc<Mutex<String>> =
    Arc::new(Mutex::new(String::from("foo")));
...
mem::replace(BAD.lock(), String::new())

Now you have a runtime String pointing to .rodata.
The root of the problem is a “dynamic” abstract memory region being pointed to from memory owned by the UnsafeCell<String> instance.

I believe we can detect that by finding “mutable ranges” based on the types used to access a memory region.

That does mean, though, that unsafe abstraction providing const fn facilities has to be extra careful: keeping track of an allocation with *mut u8 instead of *mut T may prevent the compiler from detecting UnsafeCell.

We can warn about pointers not reflecting the full size of the allocation they point to (but this breaks Vec, unless that starts using *mut [T] instead, which we might want to do anyway, e.g. to support Vec<DST>), and only point to .rodata in those cases, which would result in page faults instead of unsafety.

The other danger I can see here is querying jemalloc for some information about the allocation of a String and failing (Servo does something like this, cc @Manishearth).

8 Likes

@vadimcn Vec also has a memory allocation, yet everyone’s assuming we’ll figure out something to have const vectors. Ignoring several layers of platform-dependent newtypes around it, a PathBuf is literally a Vec<u8>.

1 Like

That is much more detailed than anything I had in mind, thanks for fleshing this out.

Could you explain the restriction:

The implementation can generate runtime allocations from abstract memory regions, although this alone is not enough to get e.g. a &'static str from String CTFE.

I don’t quite see why it would not work; is it because addresses should not be observable?

Doesn’t this line involve dereferencing a null pointer, which should be illegal?

Well, if you unconditionally translated any String escaping from const fn into a runtime allocation at the point of use, it could technically be used to create &'static str, but every use would be a different allocation and you’d be quite leaky.

To put it another way, if Box::leak is not special-cased, Box::leak(string.into_boxed_str()) would have the same result in a const fn as it does in an fn.

The special-casing might not have to be hardcoded to Box::leak, but I’m not sure how to analyze the pointers and determine that deduplication is fine.

I guess “only way to access the memory region is through a shared 'static reference with no interior mutability” is somewhat actionable.
It would break down if you, say, produced a Chars<'static> which has only raw pointers.

Unless you somehow found a “chokepoint” at which you only had the shared reference (i.e. just after calling Box::leak) and therefore deduplication is ensured no matter what.

This might result in analysis capable of detecting cases where you attempt to mutate through a raw pointer derived from &T where T doesn’t contain UnsafeCell, as that is UB anyway.
cc @ubsan

I would expect that every const is evaluated once to a value.

If/when we get generic constants, allocating a string for every constant would of course create lots of duplicate strings, but that sounds like “don’t do it”. Maybe introduce an

    extern "rust-intrinsic" const fn intern<T: ?Sized>(s: &'a T) -> &'static T;

that does deduplication (it should panic/halt compilation when given something with an UnsafeCell, or when called outside of a compile-time context).

We should also have a garbage collection pass that does not copy anything into .data that is not pointed to by anything, but you were probably assuming that.

I also assumed deduplication of read-only data, which we do today.
The problem I was trying to describe was runtime-allocating const (String::from("foo")) vs read-only data ("foo"), when that "foo" was obtained by mutating a String at compile-time.

I am somewhat more confident about the Box::leak handling strategy now that I’ve thought about it more.

It would only require raw pointers to also keep a “referential access mode”, i.e. *mut T -> &const T -> *mut T would turn a read-write raw pointer (created by allocating “heap” memory at compile-time) into a read-only raw pointer which can be deduplicated (if T doesn’t contain UnsafeCell).

Creating an &'static T pointer to something and then mutating it is UB anyway.

Indeed, I was just not sure at first how I would use that to detect intentional leaks from compile-time into “shared” read-only data.

Also, returning &'static Cell<T> from a const fn, pointing to .data should be safe, as long as it doesn’t get deduplicated (every call should get its own copy) or end up in a const, which would allow cross-thread sharing.

A &'static AtomicBool could be returned from a const fn and placed in a static, or even maybe a const (although I’m not 100% sure on the latter).

It also may be nice to somehow warn against uses of such const fns based on Box::leak at runtime, as they would result in new allocations.

@rkruppe as @vadimcn already answered above, the questions you raised are not specific to my suggested model.

@eddyb I don’t see what is the semantic difference between [T; ctfe()] and [T; macro!()]. both look equivalent to me. Also, regarding purity, what about:

const f: i32 = random!(1, 6);

The only issue I see with the above is the integration concerns made by @arielb1 but again, nothing prevents us from doing the same today in current Rust. I’d say the responsibility for this is the end user’s to audit which macros they use or write themselves (grep for “macro”…)

@arielb1 regarding integration concerns you raised - except “bad” cases such as above, i think that idiomatic use of macros can actually lessen problems with integration. as I said above, it is easily grep-able and the relevant definitions can be audited if not trusted (and thus equivalent to treatment of unsafe code). This is just a generalization - instead of having a single source transformation which we want to be reproducible we now have a series of source transformations that we want to be reproducible.

Regarding life-before-main: I think that allowing something like allocation to cross between the lifetime of the macro to the lifetime of the macro caller is problematic as I see evidence for it above. Now we are discussing de-duplication and adding a “compile-time” GC. It’s not just a simple GC, once someone writes some compile-time wizardry it’ll affect compilation time. Basically, I see this as recreating the same run-time environment a piece at a time.

Regarding the practical question of const v : Vec<u8> = ... ; I think that either the macro should generate a compile time value or generate the init expression that’ll be run by the caller. so that we get two options: const v = [1, 2, 3]; or static v = Vec::new(...);. The first will be put in the executable memory and the second will run on first access.

The MIR interpreter linked above, can it be used for a rust repl? Can it improve upon the work done in rusti?

One big difference between CTFE and syntax extensions is that syntax extensions run entirely on the host side, using host types and host libraries, while the current version of CTFE runs entirely on the target side, using target types and target libraries.

CTFE using the target libraries means that e.g. its stuct stat will have the target-correct members. Managing 2 sets of types/libraries will make things much more complicated.

Regarding life-before-main

We do not want life before main. We DO NOT WANT life before main. The evaluation of each constant/static/const-expr is supposed to be pure, at least in the Haskell-style lazy-evaluation sense.

When we translate the resulting constant-values, we allocate space in .data/.rodata for each object, but we may deduplicate .rodata objects under some conditions that we have not yet decided (I am not so sure about deduplicating the results of Box::leak because it breaks pointer equality, but deduplicating literals/interns would totally be fine). This is not really “garbage collection”.

OTOH, as you say, we may want to have garbage-collection in the middle of const-eval. As the outputs of const-eval are immutable, a simple generational GC should suffice.

Note that we do not recreate the host environment, but rather a “symbolic” version of the target environment. I imagine that someone might expose the “symbolic” interpreter in some way.

Regarding the practical question of const v : Vec = … ; I think that either the macro should generate a compile time value or generate the init expression that’ll be run by the caller. so that we get two options: const v = [1, 2, 3]; or static v = Vec::new(…);. The first will be put in the executable memory and the second will run on first access.

This is a bit annoying, because you can’t call free on stuff in .data. I imagine that you would have to use a special allocator/copy of Vec in static data.

OTOH, “run on first access” is a kind of life-before-main, which we do not want to support.

Macros don’t have access to types, constants do.

How do you handle type Foo<T> = [u8; rand(0, size_of::<T>())];?
Can Foo<u8> in two different contexts produce different results?

And that’s just the tip of the iceberg.
How does it integrate with traits?
How does cross-crate coherence even keep working?

Our type system is hard enough as it is to prove properties on.
How do you expect us to integrate side-effects into constant evaluation in the type system without breaking anything?

I don’t think it’s impossible, I just don’t expect you to be able to convince anyone on the Rust team to even touch this idea in 1.x, as it is, IMO, harder to get right than dependent typing, in a safe-by-default language.

EDIT: It’s actually impossible (breaks coherence), see my other responsible below.

There’s a previous discussion on the matter you might also want to look over.

hmm… I totally agree that we do not want life-before-main. Running on first access is supposed to be as you said: [quote=“arielb1, post:41, topic:3143”] at least in the Haskell-style lazy-evaluation sense [/quote]

Edit: regarding the different types between host and target - I don’t think it’s that much more difficult. A macro needs to handle two contexts, everything it executes internally is indeed for the host arch. but anything it returns will be for the target arch. Given an ergonomic quasi-quoting syntax it is clear which is used were. I also do not see any valid use-case whatsoever to pass such data between the two contexts. Yes, stat is different on each platform but it contains FS info. given that you cross compile from host A to target B, how can it ever be useful to have FS info that is specifically tied to A’s FS when executing code on B which has a separate FS?

I’ve asked this question several times in this discussion and no-one as of yet had provided any example where this is useful.

For your example, size_of::<u8>() should be exactly the same on all platforms - one byte, but I understand your more general point. I do wish that macros will have access to types too but I understand that it’s off the table atm.

The lazy evaluation I was talking about occurs solely at compile-time - it is how I think of non-recursive const-const dependencies being allowed.

All statics are evaluated eagerly once at translation time, and never at any other time.