[MIR] constant evaluation

Also, when discussing, consider that having a constant propagator is a desirable thing regardless, because one wants to (ab)use it to enable various non-trivial transformations during dataflow analysis that might not always be possible without the propagator.

I’d personally consider a propagator and VM/interpreter/evaluator be two distinct features. Propagator is purely a optimisation detail (which also happens to be a form of eager constant evaluation), while VM/interpreter would be more of a full feature distinct from MIR (though might be implemented on top of it) – perhaps could be implemented as a compiler plugin, even!

3 Likes

I totally agree with the above :slight_smile:

The const propagator is indeed a useful optimization pass of the compiler and it is an internal implementation detail of the compiler.

The VM/interpreter/evaluator however is what I’m arguing against as I see it as something that should be already covered by a more general macro feature and that const fn simply provides overlapping functionally to a subset of the capabilities of macros.

For people who want memory allocation in constants, how would that actually work? Would they have a chunk of static memory in the executable allocated and by virtue of being immutable constants that never get dropped, there’s no worry about trying to free that “allocation”?

This is indeed exactly what I was thinking about.

A constant can already refer to other constants, so I was thinking that each pointer/reference would lead to another (internal/unnamed) symbol properly initialized.

And indeed since there is “no life before/after main” it would mean that neither constructors nor destructors are executed for those const items.

I don't have much experience with Java/C# and no experience with LISP/Nemerle so I may be missing something obvious... but I still don't understand how cross-compilation works.

I propose that we stop talking in the abstract, and instead use this simple argument:

const Known: Vec<libc::tm> = read_from_file(".");

fn read_from_file(file_name: Path) -> Vec<libc::tm>;

Now, we want to compile this on host x86_64 Linux and target x86 Windows. I can certainly compile read_from_file for the host, but:

  • I get a Vec representation suitable for 64 bits even though the target is 32 bits
  • libc::tm is platform specific: there are some common members, but each implementation is allowed to add platform specific members (for example to handle timezones)

So I do not see how you go about reconciling the 64-bits Linux representation with the 32-bits Windows representation. Scaling down 64-bits pointers/integers to 32-bits and eliminating data members that are Linux-specific is feasible; finding out which value the Windows-specific members should have, however...

Could you explain how your multi-stage idea would solve the issue here?

Ok, let’s talk about your concrete example than. First we need to know what does the file actually contain and what are we trying to move to a previous stage. I’d say that using a Vec<libc::tm> is bad design anyway because as you said it is platform dependent.

So our stages here are:

  1. read the file and de-serialize its contents from some known format.
  2. create a const expression that contains that contents.

macro.rs (in pseudo-code):

macro read_from_file(file_name: Path) -> Expr {
    let vec: Vec<String> = read_file(file_name);
    let expr = arrayExpression::new();
    for  e in vec.lines() {
        expr.add(e.to_expr());
    }
    expr
} 

my_module.rs:

const Known = read_from_file!(".");

what I’m aiming for here is that the macro generates the initialization expression. So that the above will be transformed to something like:

const Known = [ libc::tm {year: 1999, month: 1, day: 1, ...}, ... ];

This is portable and works with cross-compilation. There are a few points of note here:

  1. What is the type of Known? I’d argue that a Vec is the wrong type since it allocates on the heap. The macro should generate an [T; n]

  2. The macro seems complex, right? Well there are two parts to this - there is re-use of regular “run-time” code to read the file for example but the other part I wrote with some API is to generate the expression. In Nemerle there is built in splicing syntax that is very ergonomic for this latter part:

    macro (…) { <[ print (“first”); ]> print (“second”); }

Everything inside <[ ]> is output tokens (the return type of the macro, an Expr in our example) and everything outside is executed regularly. Using such quasi-quoting syntax makes the above macro much shorter and much simpler to understand. 

3. What if I want a different data-structure, say a map? Rust will need to have something like the new c++ initialization syntax which allows to write:
 
    std::vector<int> v1{1, 2, 3};
    std::map<int, int> v2{{1, 2}, {2, 3}};

Or in Rust-ish :

let v: std::Vec<Foo> = [ {..}, ..]; // notice there is no vec! macro call here
let v: map<K, V> = {{k1, v1}, {k2, v2}, ..};
  1. The macro itself can execute itself any code, in our example it does I/O, memory allocation, etc… but there is a strict separation of concerns, it cannot allocate the memory to be used by the “run-time” code which runs in a separate stage. Instead, it returns the expression so that the next stage can have a static value it can manipulate. It generates new source code.

Most of the above exists in some form already in Rust. We just need to fill some holes to make it ergonomic and fun.

I am not sure how much we want to execute code that can access the host OS API during compilation - this kind of thing gives integration people nightmares. I would prefer for all host OS API access to be constrained to compiler plugins (and build-scripts) - if we assume that integrators can skip lints and that compiler plugins are rare, this makes the problem much easier.

OTOH, I am not opposed to memory allocation in constexprs - of course, this should call an intrinsic that eventually allocates memory in .data rather than calling jemalloc or whatever.

usually people raise two main concerns regarding this:

  1. security concerns
  2. integration/reproducible build concerns as you raise above.

both are IMO invalid for similar reasons:

  1. The feature as I described can be viewed as just syntax sugar. After all, nothing prevents me from implementing said scheme in any Turing complete language, even assembly. It’s just a matter of writing a program that generates source code and this is already used in build.rs scripts in the Rust ecosystem (and those can lead to the exact same integration problems). So this is a known, valid and useful technique.
  2. The user already needs to trust the compiler for both security and integration reasons anyway.
  3. If the user uses 3rd party code or executable they’ll need to trust it anyway or audit the source code. This is regardless of the timing of execution.
  4. Bottom line, the responsibility is ultimately at the user’s hands anyway. They need to decide what level of macros to use where and which macros they trust. The benefit with this strategy is that the easy and ergonomic way is the default and is easily discover-able. The user can grep their code and audit usage of this technique.
  5. Not supporting this in the language doesn’t actually remove the feature so this “limitation” is in the same realm of thought as “obscurity as security” and has the same underlining flawed logic.

Actually, the following is valid Rust (to initialize a non-constant):

let v = { let mut v: Vec<_>; v.push(..); v.push(..); v.push(..); v };

so this stage is not an issue (this is actually akin to what vec!(..) expands to). We could introduce a "list-initializer" syntax, but ultimately it would just call functions too.

Actually, as I mentioned, it's not portable.

I picked tm specifically because its list of attributes is platform-dependent. I could have picked Mutex or other kernel interfaces which may differ too. Some low-level struct are inherently platform specific. Your procedural macro would need to know the specific representation of the struct on each and every target platform...

There's also the concern that on various platforms the result of calling a particular function might return a different result (maybe it's dependent on the bit-width of pointers?), so your procedural macro would have, for each platform, guess which constant value to use in the initialization.

There's also the concern that a number of data-types cannot meaningfully be encoded in const:

  • you cannot guess in advance the Process ID
  • you cannot guess in advance the Environment in which the program is executed
  • you cannot guess in advance the Arguments passed to the program
  • you cannot guess in advance the Thread ID
  • you cannot guess in advance the File Handle

Not only are those platform-dependent, they also vary at each invocation of the program. In a language that lets you run code before main start, you could plausibly use them, but Rust is not such a language.


I agree that all the macro has to do is emit an initialization expression; this has never been the issue.

The issue is that:

  • either the macro has to be very complicated to emit a target-tailored initialization expression
  • or the expression evaluator has to be very complicated to transform a target-agnostic initialization expression into a target specific memory pattern

And the difficult bit is the "target-tailored" bit.

Yes, as mentioned the malloc/calloc/realloc/free interface would have to be marked as const and be intercepted by the evaluator.

I picked tm specifically because its list of attributes is platform-dependent.

My macro assumed that the difference was just the accuracy in number of bits, not that the struct might have different attributes altogether. if there can be different data members for different platforms than the user need to handle this just the same as in the run-time case, i.e the rust std lib handles this by e.g using cfg attributes or separating the code to modules per platform. There's no reason it'll be any different for our "compile-time" code.

Nothing here is magical, if the code depends on bit width than it would need to contain logic to handle bit width. The macro is executed in some compiler context so it should have access to an additional implicit parameter supplied by the compiler with said context info. Thus the user can query this context parameter to know what is the target architecture and decide what expression to generate based on that. No guessing involved.

There's also the concern that a number of data-types cannot meaningfully be encoded in const:

What is the problem with that? I see no use case for this. It makes sense to use threads (for example) as an implementation detail of a macro and it makes sense to generate an expression from said macro such as static foo = thread::new(...) but it makes no sense to try to make const threads.

It's just a matter of writing a program that generates source code and this is already used in build.rs scripts in the Rust ecosystem (and those can lead to the exact same integration problems).

I expect build.rs to be treated as a hack and tightly controlled, with integrators not being afraid of significantly modifying it. OTOH, having host-runnable code tightly mixed with Rust code would be much more of a problem, especially if it gets to be idiomatic.

I have no problem with it, however it means that even in the "full VM" case (or smart macros, or whatever) there is still a restriction that the user will have to learn and thus only a subset of data types will be usable (especially as those non-usable data-types are viral: they contaminate any data-types that embed them).

Thus, in any of the solution discussed, only a subset of the data-types functions is available at compile-time; the only difference is how many.

Unfortunately, this is quite error-prone (compared to calling the function) and I am afraid for forward compatibility. It means that the macro is tied to the current version of the compiler: add a new CPU/OS/... that specialize a data-type the macro used to generate, and this macro cannot be used to cross-compile to that new target.

By contrast, the set of const functions is backward compatible, and when a new platform is added that requires some to be specialized, they are. Thus, for the developers, const functions will present a more stable interface.

I think you still do not understand me. to recap:

  1. There is no VM.
  2. There is no restriction on types. a macro can use threads in its implementation. Simply, the macro has a separate lifetime. It is exactly equivalent lifetime rules that prevent you from returning a reference to a local variable defined in a function.
  3. There is no more tight coupling with macros than there are with const fn or any other regular old runtime code that is platform dependent. The std contains plenty of exactly this. I’d argue that the abstraction layer for platform dependent stuff should mostly be the Rust std and macros should use the rust std provided abstractions. Otherwise it is a design smell. In your example, what is the justification to use libc::tm instead of a rust type in std?

I don’t really have a horse in this race, let alone thought very deeply about the topic, but I have to say I don’t see how @yigal100’s scheme solves the portability problems. I mean, it sounds like a nicer model for the programmer than a restricted sub-language, but I see nothing that makes portability easier. Let’s take a concrete example to reduce the risk of talking past each other:

Let’s suppose I have a command line tool written in Rust and it accesses a couple of files on startup and for whatever reason the logic for where these files are is slightly complicated so I decide to put it in a constant expression. I cross-compile this application for a Raspberry Pi running Debian, and I happen to like VisualRust so I cross-compile from Windows. Thus what I need to end up with is constant like this:

const INIT_PATHS: Vec<PathBuf> = ...;

Now, this is a deliberately tricky constant, since it is composed of a vector (i.e., a memory allocation) and paths (whose representation is platform-dependent). Note however that this is a perfectly idiomatic type one would use in a function like get_init_paths(), and people will want to use such types, and thus we either need to handle them or reject them gracefully. Simply running said functions and pasting some representation of its result into the source code will not do. Some problems (few if any are specific to the multi-staged compilation thingy):

  • @yigal100 has previously suggested container literals. However, since there is no life before main, the .data segment of the final executable has to somehow end up containing a pointer and two integers, and the pointer has to point to memory that contains the elements. I’d be interested to hear how this should work. Whatever it is, it needs to work for arbitrary user-defined types since Vec is just that.
  • As for the elements, how do we represent them? The internal representation of PathBuf is platform-dependent even though it’s a pure Rust type. Furthermore, on Windows as well as Unix ultimately boils down to a Vec<u8> but with several layers of completely different types in between. Same issues as with Vec, with the added fun that we actually need to fill in a value of a different type. The two types have different invariants: Windows-PathBuf may contain lone surrogate code points but is otherwise UTF-8, Unix-PathBuf can contain arbitrary non-0 bytes. Simply transmuting the representation of one to the other is plainly incorrect, even if this is unlikely to cause many problems in practice.
  • Note that it’s not enough to write out the expression PathBuf::new(<the host's PathBuf as string>). PathBuf::new would require computation, not to mention allocation, on the target machine. Also there’s no life before main, yadda yadda.

In the light of these problems, it’s tempting to say: “Computing paths for a different platform at compile time is absurd! Just don’t do it!” I am actually inclined to agree, but in that case I would like to hear how the compiler would know that this type is OK and that one isn’t. It’s not as simple as looking for pointers or #[repr(C)] annotations, since some perfectly platform-independent types are repr(C) (e.g., anything which is primarily a bunch of fixed-size integers) or contain pointers (case in point: Vec, which we all want to see supported).

@hanna-kruppe: I think that nothing short of life-before-main can work for your example because of that allocation inside of PathBuf::new(). But @yigal100’s proposal would work in many other scenarios especially in combination with restricted const functions (the latter would be required to initialize types with private members).

@all: BTW, speaking of life-before-main: would it still be bad if such functions were banned from accessing mutable global state (including immutable types with interior mutability, such as Mutex-es) - to prevent ordering dependencies?

I’m sorry for being late to this discussion, given that @nikomatsakis and I have gone over this in slightly more detail that is available here.

I mostly agree with @matthieum in that allocations are more useful than full-blown IO, but there is a deeper issue: purity.

To be able to use const fn with the typesystem (i.e. [T; ctfe()]), const fn functions must be pure, in the sense that repeated calls with identical arguments must produce identical results.

While it may be possible to design a system where impurity cannot affect type safety, it’s a non-goal for Rust 1.x.

As for claims of complexity, a MIR interpreter capable of evaluating everything but impure/IO code should be simpler than the MIR -> LLVM IR translator and a LLVM JIT/interpreter, and cleaner than the current constant folder (which has been getting better, thanks to @ker).

@scott has started work on an out-of-tree prototype named miri, which reassures me that MIR is great for tooling and that we can do a lot with an order of magnitude less code than, say, the name resolution pass.

Without further ado:

Rust/CTFE Abstract Machine

A proper subset of the Rust Abstract Machine, restricted to deterministic reproducible evaluation.

That is, only const fn functions may be called, which are required to always return structurally equivalent values and produce the same writes to memory reachable by arguments, given structurally equivalent arguments.

C functions are never const fn, as the above property (also referred to as purity) cannot be enforced, although a C-to-Rust transpiler could fill in that gap.

Writes to static (global) memory and inline assembly are also prohibited, and compile-time pointers do not have observable addresses.

The CTFE capabilities offered are similar to those found in C++14 constexpr.

Data layout

The machine works with octets as its fundamental unit of measurement (for sizes, alignments and offsets), which will henceforth be referred to as bytes.
It’s parametrized by the target architecture’s data layout, which consists of:

  • endianness (the order of bytes in a multi-byte integer value)
  • size of pointers and usize/isize
  • alignments of all primitive types

Based on the machine data layout, the layout (size, alignement and field offsets) of aggregate types can be determined.
The layout of non-#[repr(C)] aggregates is implementation-dependent.

Primitive types

Boolean, C-like enum, character codepoint, integer and floating-point values will be represented as exactly one valid value in their respective representable range.

Addresses (pointers, references, all non-local lvalues) will be one of:

  • integer addresses: x: usize in (x as *T).
    Cannot be read/written through, but other operations do work (on the integer value), e.g.:
&(*(0 as *const Struct)).field as *const _ as usize
  • abstract locations: (region, x: usize) in region.offset(x) with x < region.size.
    Can be read/written through, except for writes to static (global) memory.
    Can only be compared with another abstract location from the same abstract memory region (the offsets are compared).
    No bits of a pointer holding an abstract location may be observed as any other type than a pointer, for which the whole value must be used.

Abstract memory region

A contiguous block of memory with a fixed size, consisting of size constant bytes and a set of non-overlapping special values (undef bytes and relocations).

Undef bytes can be found in uninitialized memory (e.g. mem::uninitialized()) and are also produced by writing an aggregate, at positions not covered by leaf fields (i.e. padding bytes).

Reading undef bytes is prohibited, as code that does this is likely to exhibit undefined behavior at runtime.
Aggregate reads side-step this by only reading bytes covered by leaf fields.

Primitives are read/written at the byte level, and the combined bit pattern must represent a valid value (transmute is no different).

For non-pointer primitives, none of the bytes read may overlap with a relocation.
For pointers, either:

  • None of the bytes read overlap with a relocation, producing an integer address.
  • The entire pointer aligns perfectly with one relocation, producing an abstract location, where the constant bytes gives the offset, and the relocation gives the abstract memory region it’s pointing into.

Writing over a relocation is only permitted if the entire relocation is written over in the same operation, replacing it (e.g. a 32-bit pointer relocation can be written over with *mut T, u32 or (u16, u16) but not two u16 writes).

To give an example, the following two abstract memory regions would be produced to hold &[(0u8, 1u16), (2, 3)][1..], for a little-endian target with 32-bit pointers:

Region A:
Data:     04  00  00  00  01  00  00  00
Special:  └──Reloc(B)──┘

Region B:
Data:     00  00  01  00  02  00  03  00
Special:     Undef           Undef

Dynamic memory allocation (optional)

An implementation can provide intrinsics to allocate new abstract memory regions, deallocate them (invalidating dangling references and preventing further uses) and resize them (which can always be done “in place”).

Special care must be taken to ensure that values of types that administer such memory (Box, Vec, String, etc.) do not escape into runtime code while holding a pointer to an abstract memory region.

The implementation can generate runtime allocations from abstract memory regions when a const item “owns” the allocation, although this alone is not enough to get e.g. a &'static str from String CTFE.

EDIT: discussion resulted in the addition of the paragraphs below:

All abstract memory regions reachable from a &'static T can be placed in static memory, with the same restrictions as borrowing in a const item (no UnsafeCell), e.g.:

const WORKS: &'static str = &5.to_string();
const WORKS2: &'static str =
    Box::leak(5.to_string().into_boxed_str());

const FAILS_OLD: &'static AtomicBool =
    &AtomicBool::new(false);
// Arc uses 
const FAILS: &'static Arc<i32> = &Arc::new(0);

No runtime allocations shall be used, only read-only static allocations. static items could also be extended to allow thread-safe shared data, e.g.:

static FOO: Arc<Foo> = Arc::new(Foo {...});

One could only increment and decrement the refcount of FOO, but it would never go below 1, so the allocation being in .data would not be an issue.

The UnsafeCell is more subtle, though, as the following also needs to be forbidden:

static GOOD: Arc<Mutex<String>> =
    Arc::new(Mutex::new(String::new()));
static BAD: Arc<Mutex<String>> =
    Arc::new(Mutex::new(String::from("foo")));
...
mem::replace(BAD.lock(), String::new())

Now you have a runtime String pointing to .rodata.
The root of the problem is a “dynamic” abstract memory region being pointed to from memory owned by the UnsafeCell<String> instance.

I believe we can detect that by finding “mutable ranges” based on the types used to access a memory region.

That does mean, though, that unsafe abstraction providing const fn facilities has to be extra careful: keeping track of an allocation with *mut u8 instead of *mut T may prevent the compiler from detecting UnsafeCell.

We can warn about pointers not reflecting the full size of the allocation they point to (but this breaks Vec, unless that starts using *mut [T] instead, which we might want to do anyway, e.g. to support Vec<DST>), and only point to .rodata in those cases, which would result in page faults instead of unsafety.

The other danger I can see here is querying jemalloc for some information about the allocation of a String and failing (Servo does something like this, cc @Manishearth).

8 Likes

@vadimcn Vec also has a memory allocation, yet everyone’s assuming we’ll figure out something to have const vectors. Ignoring several layers of platform-dependent newtypes around it, a PathBuf is literally a Vec<u8>.

1 Like

That is much more detailed than anything I had in mind, thanks for fleshing this out.

Could you explain the restriction:

The implementation can generate runtime allocations from abstract memory regions, although this alone is not enough to get e.g. a &'static str from String CTFE.

I don't quite see why it would not work; is it because addresses should not be observable?

Doesn't this line involve dereferencing a null pointer, which should be illegal?