Semantics of StorageLive/StorageDead in MIR

Jaic1 · February 21, 2022, 3:27am

The behavior described in StatementKind suggest that the pair (StorageLive / StorageDead) determines the lifetime of a local. Looking at this github issue, I find the pair is relevant to llvm.lifetime.start / llvm.lifetime.end (which the pair is directly translated into when rustc lowers MIR to LLVM IR, perhaps?) .

Besides their semantics, I wonder what is the original design goal of the pair? One of the reason I can think of is that we reuse LLVM's llvm.lifetime.start / llvm.lifetime.end to reduce the overall stack space intra-procedurally. Are there any other reasons?

I'd like to hear about the latest status of StorageLive / StorageDead.
I'd also like to help, e.g., write up their design and implementation somewhere, since some valuable contents are outdated or scattered. Peeking at the rustc source code might be a choice, but I kind of don't know where to start and sometimes I can get lost. It would be appreciated if someone could provide some mentoring or discussion.

bjorn3 · February 21, 2022, 8:12am

They are used both to lower to the llvm intrinsics to reduce stack space and used by mir borrowck to determine when borrows get invalidated.

Jaic1 · February 21, 2022, 8:36am

... and used by mir borrowck to determine when borrows get invalidated.

I wonder whether they are the input or the output of mir borrowck.

bjorn3 · February 21, 2022, 2:19pm

They are in the input of mir borrowck. They are inserted as part of mir construction.

Jaic1 · February 22, 2022, 1:55am

Then I guess the construction of them is determined only by the lexical scope (inspired by this example) and is not exploiting any data flow analysis.

The current borrow checker (NLL) is non-lexical and is more powerful. So perhaps the construction of StorageLive / StorageDead can be optimized after borrowck? In the above example, the interval between the StorageLive / StorageDead of a1 and a3 can be shorter. But I am not sure whether the optimization is beneficial and worthwhile.

Jaic1 · February 24, 2022, 7:37am

Found a related open issue in the unsafe-code-guidelines repo here.

vakaras · February 24, 2022, 1:34pm

I noticed that there is no StorageLive/StorageDead for temporaries to which the enum discriminant is assigned. Is this behaviour documented somewhere?

For example, for the following snippet:

fn test() {
    let a = Some(5u32);
    let _x = match a {
        Some(y) => y,
        None => 4,
    };
}

enums.test.-------.renumber.0.mir would be:

// MIR for `test` 0 renumber

fn test() -> () {
    let mut _0: ();                      // return place in scope 0 at enums.rs:29:11: 29:11
    let _1: std::option::Option<u32>;    // in scope 0 at enums.rs:30:9: 30:10
    let mut _3: isize;                   // in scope 0 at enums.rs:32:9: 32:16
    scope 1 {
        debug a => _1;                   // in scope 1 at enums.rs:30:9: 30:10
        let _2: u32;                     // in scope 1 at enums.rs:31:9: 31:11
        let _4: u32;                     // in scope 1 at enums.rs:32:14: 32:15
        scope 2 {
            debug _x => _2;              // in scope 2 at enums.rs:31:9: 31:11
        }
        scope 3 {
            debug y => _4;               // in scope 3 at enums.rs:32:14: 32:15
        }
    }

    bb0: {
        StorageLive(_1);                 // scope 0 at enums.rs:30:9: 30:10
        _1 = std::option::Option::<u32>::Some(const 5_u32); // scope 0 at enums.rs:30:13: 30:23
        FakeRead(ForLet(None), _1);      // scope 0 at enums.rs:30:9: 30:10
        StorageLive(_2);                 // scope 1 at enums.rs:31:9: 31:11
        FakeRead(ForMatchedPlace(None), _1); // scope 1 at enums.rs:31:20: 31:21
        _3 = discriminant(_1);           // scope 1 at enums.rs:31:20: 31:21
        switchInt(move _3) -> [0_isize: bb1, 1_isize: bb2, otherwise: bb3]; // scope 1 at enums.rs:31:14: 31:21
    }

    bb1: {
        _2 = const 4_u32;                // scope 1 at enums.rs:33:17: 33:18
        goto -> bb5;                     // scope 1 at enums.rs:33:17: 33:18
    }

    bb2: {
        falseEdge -> [real: bb4, imaginary: bb1]; // scope 1 at enums.rs:32:9: 32:16
    }

    bb3: {
        unreachable;                     // scope 1 at enums.rs:31:20: 31:21
    }

    bb4: {
        StorageLive(_4);                 // scope 1 at enums.rs:32:14: 32:15
        _4 = ((_1 as Some).0: u32);      // scope 1 at enums.rs:32:14: 32:15
        _2 = _4;                         // scope 3 at enums.rs:32:20: 32:21
        StorageDead(_4);                 // scope 1 at enums.rs:32:20: 32:21
        goto -> bb5;                     // scope 1 at enums.rs:32:20: 32:21
    }

    bb5: {
        FakeRead(ForLet(None), _2);      // scope 1 at enums.rs:31:9: 31:11
        _0 = const ();                   // scope 0 at enums.rs:29:11: 35:2
        StorageDead(_2);                 // scope 1 at enums.rs:35:1: 35:2
        StorageDead(_1);                 // scope 0 at enums.rs:35:1: 35:2
        return;                          // scope 0 at enums.rs:35:2: 35:2
    }
}

As you can see, there is no StorageLive(_3) even though, the variable is assigned _3 = discriminant(_1);.

Jaic1 · February 25, 2022, 2:52am

Based on your example, here is another example:

fn test(a: Option<u32>) -> u32 {
    let x = match a {
        Some(y) => y,
        None => 4,
    };
    x + 1
}

The generated mir:

fn test(_1: Option<u32>) -> u32 {
    debug a => _1;                       // in scope 0 at src/main.rs:5:9: 5:10
    let mut _0: u32;                     // return place in scope 0 at src/main.rs:5:28: 5:31
    let _2: u32;                         // in scope 0 at src/main.rs:6:9: 6:10
    let mut _3: isize;                   // in scope 0 at src/main.rs:7:9: 7:16
    let _4: u32;                         // in scope 0 at src/main.rs:7:14: 7:15
    let mut _5: u32;                     // in scope 0 at src/main.rs:10:5: 10:6
    scope 1 {
        debug x => _2;                   // in scope 1 at src/main.rs:6:9: 6:10
    }
    scope 2 {
        debug y => _4;                   // in scope 2 at src/main.rs:7:14: 7:15
    }

    bb0: {
        StorageLive(_2);                 // scope 0 at src/main.rs:6:9: 6:10
        _3 = discriminant(_1);           // scope 0 at src/main.rs:6:19: 6:20
        switchInt(move _3) -> [0_isize: bb1, 1_isize: bb3, otherwise: bb2]; // scope 0 at src/main.rs:6:13: 6:20
    }

    bb1: {
        _2 = const 4_u32;                // scope 0 at src/main.rs:8:17: 8:18
        goto -> bb4;                     // scope 0 at src/main.rs:8:17: 8:18
    }

    bb2: {
        unreachable;                     // scope 0 at src/main.rs:6:19: 6:20
    }

    bb3: {
        StorageLive(_4);                 // scope 0 at src/main.rs:7:14: 7:15
        _4 = ((_1 as Some).0: u32);      // scope 0 at src/main.rs:7:14: 7:15
        _2 = _4;                         // scope 2 at src/main.rs:7:20: 7:21
        StorageDead(_4);                 // scope 0 at src/main.rs:7:20: 7:21
        goto -> bb4;                     // scope 0 at src/main.rs:7:20: 7:21
    }

    bb4: {
        StorageLive(_5);                 // scope 1 at src/main.rs:10:5: 10:6
        _5 = _2;                         // scope 1 at src/main.rs:10:5: 10:6
        _0 = Add(move _5, const 1_u32);  // scope 1 at src/main.rs:10:5: 10:10
        StorageDead(_5);                 // scope 1 at src/main.rs:10:9: 10:10
        StorageDead(_2);                 // scope 0 at src/main.rs:11:1: 11:2
        return;                          // scope 0 at src/main.rs:11:2: 11:2
    }
}

There is no StorageLive(_0) too. Since _0 is return value, I guess the reason might be that the semantics/effect of StorageLive/StorageDead might be implicit for _0. Another reason might be that StorageLive/StorageDead are optional for a local.

scottmcm · February 25, 2022, 3:08am

That would be my guess. The _0 slot is "owned" by the caller, so they made it Live before calling the function, and you can't really "dead" it.

Jaic1 · February 25, 2022, 8:00am

It make sense.

By taking a look at a refined example's LLVM IR and LangRef, I find llvm.lifetime.start/llvm.lifetime.end are for stack memory objects allocated from alloca, and not for general virtual registers.

The simplified LLVM IR:

; playground::main
; Function Attrs: nonlazybind uwtable
define internal void @_ZN10playground4main17hef0c7f6bf3b3bda7E() unnamed_addr #2 {
start:
  ... relevant to printf! ...
 
  %r2 = alloca i32, align 4
  %r1 = alloca i32, align 4
  %0 = bitcast i32* %r1 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0)
; call playground::test
  %1 = tail call fastcc i32 @_ZN10playground4test17ha00e641b5cdd1a4fE(i32 0, i32 undef)
  store i32 %1, i32* %r1, align 4
  %2 = bitcast i32* %r2 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %2)
; call playground::test
  %3 = tail call fastcc i32 @_ZN10playground4test17ha00e641b5cdd1a4fE(i32 1, i32 %1)
  store i32 %3, i32* %r2, align 4

  ... relevant to printf! ...

  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %2)
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0)
  ret void
}

; playground::test
define internal fastcc i32 @_ZN10playground4test17ha00e641b5cdd1a4fE(i32 %0, i32 %1) unnamed_addr #3 {
start:
  %switch = icmp eq i32 %0, 0
  %phi.bo = add i32 %1, 1
  %x.0 = select i1 %switch, i32 5, i32 %phi.bo
  ret i32 %x.0
}

Instruction flow of %r1 in main:

  %r1 = alloca i32, align 4
  %0 = bitcast i32* %r1 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0)
  ; call playground::test
  %1 = tail call fastcc i32 @_ZN10playground4test17ha00e641b5cdd1a4fE(i32 0, i32 undef)
  store i32 %1, i32* %r1, align 4
  ...
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0)

Now I start to wonder the difference between the stack model of MIR and LLVM IR, for example, local in MIR and virtual registers in LLVM IR.

vakaras · February 25, 2022, 1:44pm

Function parameters and the result are alive for the entire function body, so I do not find it surprising that there is no StorageLive/StorageDead for them. However, temporary variables used to store discriminants are alive only for certain blocks of the function, so it is surprising that there is no StorageLive/StorageDead for them.

Jaic1 · February 25, 2022, 3:41pm

I see your point.

scottmcm · February 25, 2022, 5:49pm

All MIR locals are codegenned to allocas in LLVM. Then mem2reg in LLVM converts them to LLVM's virtual registers where possible.

Jaic1 · February 26, 2022, 12:57am

Get it. Thanks!

RalfJung · March 4, 2022, 7:28pm

Besides that one (and the issues it references), here are some other related issues

github.com/rust-lang/rust

Resolve the interaction between StorageDead, drops, and unwinding

opened 07:50PM - 21 May 19 UTC

closed 11:29PM - 14 Jul 22 UTC

cramertj

C-enhancement A-destructors T-lang T-compiler A-mir A-coroutines

https://github.com/rust-lang/rust/pull/60840 introduces the expectation that onc…e `drop` runs on a local, it is UB for that local to be accessed, regardless of the success or failure (panic) of the `drop`. Unfortunately, adding extra `StorageDead`s for all of these values caused a performance regression, so at the moment we don't do this. https://github.com/rust-lang/rust/pull/60840 instead uses `drop` to indicate that a value is implicitly `StorageDead`. However, this interacts poorly with drop elaboration, which can turn a `drop` of a structure into `drop`s of its fields. cc @Zoxc @tmandry @RalfJung

github.com/rust-lang/rust

Recycle storage after move

opened 07:09PM - 14 Jun 19 UTC

tmandry

I-slow C-enhancement T-lang T-compiler I-heavy C-optimization

We should experiment with "re-allocating" storage for a local after it's moved f…rom if it gets re-initialized. This would enable more optimizations, but could have some potential fallout. EDIT: See [this comment](https://github.com/rust-lang/rust/issues/61849#issuecomment-507897665) for further explanation of what kinds of optimizations this would enable. Quoth @RalfJung, from https://github.com/rust-lang/rust/issues/59123#issuecomment-501990026: > Currently, the following (entirely safe) code will definitely return `true`: > > ```rust > let mut x = String::new(); > let addr_x = &x as *const _ as usize; > drop(x); > // later > x = String::new(); > let addr_x2 = &x as *const _ as usize; > return addr_x == addr_x2; > ``` > > If we want to do optimizations like yours here (and I am totally sympathetic to that), we have to explain in the "Rust Abstract Machine" (and in Miri) why this program might return `false`. And the answer cannot be "there is UB", because this is safe code. > > This is a topic that @nikomatsakis, @eddyb and me have touched on several times already, without ever hashing out a full plan. But in the current state of affairs, the only mechanism we have to "defeat" pointer equality tests like the above is to make sure that this is not the same allocation any more. > > So, one thing we might do is to do `StorageDead(x); StorageLive(x);` immediately after every move. This "re-allocates" `x` and thus definitely kills any existing pointers and also "defeats" pointer comparisons. The immediate `StorageLive` is to keep the liveness state in sync in both branches of a conditional (which might or might not be relevant -- unfortunately LLVM's semantics for these intrinsics is less than clear). I guess the `StorageLive` could be moved down in cases where there is no merging control flow, which should give you your optimization in many cases. It is possible to do a subset of the optimization discussed in #59123 without this, but this would help cover more cases, in this optimization and others. cc @cramertj @eddyb @nikomatsakis @RalfJung

github.com/rust-lang/rust

StorageLive (and even StorageDead) may be unnecessary in MIR.

opened 11:01PM - 28 Jan 20 UTC

eddyb

C-cleanup T-compiler A-mir

A while back I was discussing `Storage{Live,Dead}` and dominators, with @tmandry… (in the context of generator layout optimizations), and came to the conclusion that `StorageLive` pretty much *has to* dominate all uses (I doubt we ever added a check that it does so, though). More recently, I was trying to figure out what the simplest "`StorageLive` sinking" (i.e. moving the statement "later" in the CFG) optimization we could do was. The conclusion I came to was that we might not need `StorageLive` at all, because there might be a deterministic "best placement" we could compute (assuming we need *exactly* one `llvm.lifetime.start` per `alloca`). <hr/> That best placement would be the *least (common) dominator* of all mentions of a MIR local. Even indirect accesses require a direct borrow beforehand, so this should cover everything. (Assuming that, given CFG points `x`, `y`, `z`, "`x` is a common dominator of `y` and `z`" means "`x` dominates both `y` and `z`", i.e. "to reach either `y` or `z` you must go through `x` first", and the "least" such `x` is the one not dominating other common dominators of `y` and `z`, i.e. it's "the closest to `y` and `z`") This could be: * just before the single assignment of that local * `let x = x + y;` * at the end of a block branching into paths which all assign that local * `let x = if c { a } else { b };` * `let x; if c { x = a; } else { x = b; }` (roughly equivalent) I am not sure about interactions with loops, though. But this doesn't have to remain theoretical, we could compute this "ideal `StorageLive` position" and then compare it with the existing one (presumably one would dominate the other? not sure this would catch any loop issues though). <hr/> `StorageDead` could also be similar ("least (common) post-dominator"?). However, it also has the effect of invalidating borrows, so we would need to keep an `InvalidateBorrows(x)` statement around, and consider it one of the mentions of `x`. Then "`Storage{Live,Dead}` range shrinking" would simply boil down to hoisting `InvalidateBorrows(x)` up past statements which couldn't indirectly access `x`. <hr/> cc @nikomatsakis @ecstatic-morse @rust-lang/wg-mir-opt

system · June 2, 2022, 7:28pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Liveness annotation in MIR compiler	3	638	October 28, 2022
MIR visitor implementation compiler	4	491	January 6, 2024
C++ "Lifetime Profile 1.0", a.k.a. C++ might get a sort of borrow checker	8	5094	March 25, 2019
[blog post] Nested method calls via two-phase borrowing language design	36	8182	March 25, 2019
Why does mir have so many temporaries? compiler	2	1211	March 25, 2019

Semantics of StorageLive/StorageDead in MIR

Related topics