Thought: saving users from themselves in macros that explode


#1

You do NOT want to attempt to compile the following snippet:

macro_rules! blowup {
  ([$($t:tt)+]) => { blowup!{[ $($t:tt)+ ]} }; // note the :tt !
}

fn main() {
  blowup!([a]);
}

(note: the brackets are only there to work around a bug fixed in #52145 which slows down the parsing of $($t:tt)+)

If you try to build this and are on linux with a reasonably-sized swap partition, you are very likely to send yourself into complete and utter turmoil. You’ve probably experienced this before doing something else similarly dumb:

  • The program eventually requires so much memory (and is using it so actively) that memory for other absolutely essential things (like your window manager) are paged out to swap, causing your system to stop responding.
  • Soon, the program requires so much memory that not even all of its own memory can fit in physical memory. The program’s execution is slowed down so much by constant page faults that it can’t even reach the point where it would OOM.
  • …so basically you find yourself with a very warm brick for a few hours (or until you shut it off).

The prevailing philosophy in most circles is:

Well, then don’t do that!

but you know as well as I know that these things do not usually happen on purpose!


So maybe the compiler can help. Mind, we don’t have to solve the halting problem, all we need here is bang-for-the-buck; we want to head-off common deadly mistakes, like an extra :tt in an incremental muncher. Rust already has a #[recursion_limit] to help combat some forms of rogue macros, but it doesn’t help much with the above example.

A strawman proposal

@limira in this comment on #51754 suggests the following:

  1. Define a constant (like recursion_limit), maybe its name is: max_macro_grow_factor = 1024.
  2. Record the first number of tokens: n
  3. At each recursive call, check the new number of tokens: m
  4. If m > n * max_macro_grow_factor then
  • stop the macro-expansion process
  • report error to user
  • suggest them to check the involving rules
  • suggest them to increase max_macro_grow_factor = 2048 if they are sure there is no problem with their macro.

I’m going to push this forward as a starter for discussion. I don’t think that this precisely is the way to go, as there are subtleties like

  • what if a macro has no tokens as input?
  • it is common for macros that take small inputs to expand to very large outputs composed largely of fixed tokens
  • what about backwards compatibility? Is there a reasonable default factor we can pick such that no existing legitimately-working macro is liable to exceed it?

so I’m opening this thread as a place to toss around ideas. Any thoughts?


#2

Set a memory limit?


#3

It would be nice if the limit was program-dependant, rather than platform dependant. That’s my problem with memory limits.

In general, I believe NFA regex parsers can explore an exponential number of pathes. Depending on the order of exploring paths, I think we could end up using exponential space in our macros parser (it’s basically a breath first search).

We could switch to something like depth-first search. Then you would only use linear memory and would just run into the recursion limit, right? On the other hand, we have the normal problems with DFS…


#4

This however is not a problem of exponential solution space. Each time blowup! expands it takes time and memory that is linear in the input; but the input itself grows exponentially with each call.

(but geeze, I wonder… are there pathological macros out there that the macro parser generates an exponential number of paths for? ISTM that the parser is too averse to ambiguities for this to actually occur)


#5

Would this be prevented by setting a maximum length of the macro input? One of the recursive calls would probably fail then. I don’t really see why we need to compute a growth rate etc instead of having a static limit.


#6

I would still rather the limit be program dependant.