[MIR] constant evaluation

BAD was specifically bad because you had a mutable String initially pointing at non-runtime memory, the Arc is fine (hence the example with String::new() which does not point to anything).

That said, static support for “compile-time dynamic allocations” is entirely optional and will probably remain feature-gated for more time than the rest.

Probably yeah? We only use this for memory reporting so perf is not a major concern.

What I am afraid of is introducing a performance overhead on all calls just so the allocator hooks work even for “compile-time dynamic allocations”. As @eddyb said though, for now this seems difficult to pull off so maybe we can just punt on it.

FWIW, I never considered allocator hooks for preventing issues with CTFE “allocations” as an option, although the checks could be integrated into a complex allocator somehow.

FYI - a few papers that explain the model I was advocating above.

  1. http://www.cs.rice.edu/~taha/publications/journal/dspg04a.pdf
  2. http://www.cs.rice.edu/~taha/publications/journal/tcs00.pdf

Rust has a few issues with that model:

  • host vs. target: ML has no notion of a host vs. target, Rust has.
  • dynamic allocation: as you can understand from what me and eddyb were talking about, allocating free-able memory at compile-time is nontrivial. This is not an issue in “dynamically-loaded” languages like ML.
  • trait-system: ML has no trait system, which creates implicit between-stage dependencies.

Basically, Rust compilation has 3 stages:

  • all types are computed. constants may also be computed, but their memory placement is not
  • all statics and constants are computed along with their memory placement
  • run-time
  • host vs. target: ML has no notion of a host vs. target, Rust has.

True. Note though that this is just one example of implementing this model. I thought that a formal academic paper will be able to explain better than me. A more modern example to look at would be Nemerle.

Regarding cross-compilation, it indeed adds a complication. It means that ~.<libc::tm>. != libc::tm (hopefully I used the operators in the paper correctly). This means that if I evaluate the deferred expression libs::tm on a different target machine, it doesn’t have to be identical to libc::tm on the host machine. The responsibility to account for this is on the user but I do not believe it matters to the vast majority of code people put in macros.

  • dynamic allocation: as you can understand from what me and eddyb were talking about, allocating free-able memory at compile-time is nontrivial. This is not an issue in “dynamically-loaded” languages like ML.

I agree with this as well. This is why I suggest to employ the Nemerle model - each stage is compiled separately. This does not allow an allocation to cross between stages or in other words, do not change the status quo we already have today in Rust.

Doing it manually requires the user to do the following actions:

  1. Compile a macro definition into a shared object
  2. Compile source code into executable by specifying the macro.dll as a plugin to the compiler.

I don’t see any benefit of supporting compile-time allocations as discussed other than the fact it will be more convenient to use a const Vec<T> over const [T; n] which indicates that we need to improve ergonomics for the latter instead of adding non-trivial support for the former.

  • trait-system: ML has no trait system, which creates implicit between-stage dependencies.

Basically, Rust compilation has 3 stages:

  • all types are computed. constants may also be computed, but their memory placement is not
  • all statics and constants are computed along with their memory placement
  • run-time

well, there’s another stage before the above - macro expansion.

First, rustc const-eval (the original topic of this thread) explicitly runs on a virtual version of the target. That’s why it is nice to use - on an abstract level, it doesn’t matter when it is executed.

Host code mixed with target code is at the very least a severe portability hazard - not only libc::tm, but even usize can vary between the host and target. Similarly, code running on the host will see the libraries on the host - which are of course different from the ones on the target. One of the reasons syntax extensions are ugly is because they must work with this distinction.

Compile-time allocations are useful for creating data structures at compile time whose size is calculated dynamically - imagine having to guess the final size of a hash table.

well, there’s another stage before the above - macro expansion.

macro expansion is a second-class citizen. Macros can’t depend on types and for a good reason.

Host code mixed with target code is at the very least a severe portability hazard

This claim is repeatedly given without any substantiation. What is a real-life use-case were this theoretical problem (which I agree exists!) hampers or complicates the code? Where does this affect a user in practice? What is the use case were there is need to use platform specific code at compile-time?

Compile-time allocations are useful for creating data structures at compile time whose size is calculated dynamically - imagine having to guess the final size of a hash table.

Again, same question as above, what is a real-life use-case for such compile-time data structures?

I’d argue that if we’re trying to optimize even the allocation itself of the DS by moving it to compile-time (storing the data in the program’s memory), than why not optimize away the DS itself as well? After all, we already have complete knowledge ahead of time of the DS. Instead of allocating a hash-table, generate a static array [T; n] and replace access to such a hash-table with direct access to said array.

macro expansion is a second-class citizen. Macros can’t depend on types and for a good reason.

macro expansion should not be IMO a second-class citizen. While I understand the complexity of allowing macros to depend on type-info I don’t think it is good reason enough to cripple macros so much. Plenty of real-life uses for this are demonstrated in other languages.

For example, automatically generating the intrinsic -> signature table in rustc_platform_intrinsics.

I think we are coming from different points. Compile-time dynamic allocation is equivalent to generating an array of a size unknown in parse-time (which is what you referred to with [T; n]). const_eval is not a form of code generation - it uses exactly the same types as the outer Rust, so it can't really generate arrays of non-fixed sizes, with Rust not being dependently typed and all of that.

Types are allowed to depend on macros in a very flexible way. Allowing macros to depend on (target) types in a similarly flexible way will create a very annoying circular dependency.

You can see that the papers you linked don't allow compile-time code to declare types.

Actually, it might be the case that we don't precisely understand one another.

First, const-eval can run in the middle of type-checking. When we get integer generics, there will be type-checking -> const-eval -> type-checking cycles. It is not separable from it.

However, in most cases what you want is a kind of a "monomorphization hack": to run after types are inferred, and execute code that generates code for a function/static according to these types. I don't see that much theoretical problem with that idea (especially now that we have specialization).

Still, if that code runs on the host, it will have to deal with both host and target types. Because of the complexity of integrating typed AST with Rust, it will probably have to generate an untyped AST, which all the annoyances that implies. There's a good chance that the final design for MIR plugins will allow you to do this.

Of course, your other point is that you want to have plugins in the same source-code as the code you compile. I don't like this, because plugins run in a different execution environment from normal code. Basically the entire pipeline of rustc has to run twice (or more) - once for the plugins, once for the target. If you are doing this, why not move the handling to a higher level?

I think this thread got severely derailed.

Originally, this thread was about const_eval, which is how we get “no life before main” - constants are perfectly ordinary target Rust code, it’s just that they are pure, so that they can be evaluated during compilation.

Then, this thread diverged to:

  • creating type-safe Vecs in constants
  • mixing plugins with normal code
  • adding new plugin hook-points

These are quite distinct issues from const_eval, and I think that each of them basically deserves its own thread.

1 Like

Off topic, but note that LALRPOP has exactly this already. (Though I’ve never benchmarked it or anything.)

  1. I’m currently not up to date in this thread, so I apologize if this was already mentioned

When skimming, I read a highlighted “not proposing a VM” and though: Wouldn’t it be awesome if const fn’s would be evaluated in a VM?! and then Aren’t procuderal macros and const fn’s quite similar?

  1. I mean in the end const fn's are code run(/evaluated) at compiler time which produces a const's or static's value. Procedual macros do the same but can also produce all kind of “other” thinks, they also are allowed to do “more” when evaluated/run.

  2. Expect the capabilities the difference is how const fn's are written compared to procedural macros

  3. it should be possible to produce the same effect as any const fn with a procedural macro, just const fn's are more intuitive and can be also called during runtime

  4. Uh, we might want to sandbox any “stable” procedural macro system, e.g. running it in a VM (It is bad if injecting code in a library is used as a attack vector against production systems, it is terrible if it additional infects the developers system, potentially injecting code in anything maintained by him… :astonished:)

  5. Just a Idea, but maybe we can unify constant evaluation and “stable” procedural macro evaluation by transpiling const fn's in a constant context to anonyme/inline procedural macros because this…

    1. … might (probably) make maintenance of both easier, as it is uniform
    2. … makes it easier to add features like allocating static memory at compiler time
    3. … have just “secure” one system for compiler time “evaluations”(“running of code”)
    4. … could allow/make it easier to writing parts of macros as const fn's which can sometimes be more intuitive

Note: I know that everyone should check any code he/she compiles anyway as there can be security vulnerabilities in the compiler etc. Also I’m aware that after compilation you will most likely run the code’s tests, so a code injection into a widely used library is always a big (and probably underestimated) problem. I’m just against adding an additional attack vector. (And maybe we would want to add a (per user/global config) option to automatically run tests in a sandbox and/or VM, through that is pretty off topic and would not solve all problems either).

1 Like
3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.