Semantics of StorageLive/StorageDead in MIR

The behavior described in StatementKind suggest that the pair (StorageLive / StorageDead) determines the lifetime of a local. Looking at this github issue, I find the pair is relevant to llvm.lifetime.start / llvm.lifetime.end (which the pair is directly translated into when rustc lowers MIR to LLVM IR, perhaps?) .

Besides their semantics, I wonder what is the original design goal of the pair? One of the reason I can think of is that we reuse LLVM's llvm.lifetime.start / llvm.lifetime.end to reduce the overall stack space intra-procedurally. Are there any other reasons?

I'd like to hear about the latest status of StorageLive / StorageDead.
I'd also like to help, e.g., write up their design and implementation somewhere, since some valuable contents are outdated or scattered. Peeking at the rustc source code might be a choice, but I kind of don't know where to start and sometimes I can get lost. It would be appreciated if someone could provide some mentoring or discussion.

1 Like

They are used both to lower to the llvm intrinsics to reduce stack space and used by mir borrowck to determine when borrows get invalidated.

... and used by mir borrowck to determine when borrows get invalidated.

I wonder whether they are the input or the output of mir borrowck.

They are in the input of mir borrowck. They are inserted as part of mir construction.

Then I guess the construction of them is determined only by the lexical scope (inspired by this example) and is not exploiting any data flow analysis.

The current borrow checker (NLL) is non-lexical and is more powerful. So perhaps the construction of StorageLive / StorageDead can be optimized after borrowck? In the above example, the interval between the StorageLive / StorageDead of a1 and a3 can be shorter. But I am not sure whether the optimization is beneficial and worthwhile.

Found a related open issue in the unsafe-code-guidelines repo here.

I noticed that there is no StorageLive/StorageDead for temporaries to which the enum discriminant is assigned. Is this behaviour documented somewhere?

For example, for the following snippet:

fn test() {
    let a = Some(5u32);
    let _x = match a {
        Some(y) => y,
        None => 4,
    };
}

enums.test.-------.renumber.0.mir would be:

// MIR for `test` 0 renumber

fn test() -> () {
    let mut _0: ();                      // return place in scope 0 at enums.rs:29:11: 29:11
    let _1: std::option::Option<u32>;    // in scope 0 at enums.rs:30:9: 30:10
    let mut _3: isize;                   // in scope 0 at enums.rs:32:9: 32:16
    scope 1 {
        debug a => _1;                   // in scope 1 at enums.rs:30:9: 30:10
        let _2: u32;                     // in scope 1 at enums.rs:31:9: 31:11
        let _4: u32;                     // in scope 1 at enums.rs:32:14: 32:15
        scope 2 {
            debug _x => _2;              // in scope 2 at enums.rs:31:9: 31:11
        }
        scope 3 {
            debug y => _4;               // in scope 3 at enums.rs:32:14: 32:15
        }
    }

    bb0: {
        StorageLive(_1);                 // scope 0 at enums.rs:30:9: 30:10
        _1 = std::option::Option::<u32>::Some(const 5_u32); // scope 0 at enums.rs:30:13: 30:23
        FakeRead(ForLet(None), _1);      // scope 0 at enums.rs:30:9: 30:10
        StorageLive(_2);                 // scope 1 at enums.rs:31:9: 31:11
        FakeRead(ForMatchedPlace(None), _1); // scope 1 at enums.rs:31:20: 31:21
        _3 = discriminant(_1);           // scope 1 at enums.rs:31:20: 31:21
        switchInt(move _3) -> [0_isize: bb1, 1_isize: bb2, otherwise: bb3]; // scope 1 at enums.rs:31:14: 31:21
    }

    bb1: {
        _2 = const 4_u32;                // scope 1 at enums.rs:33:17: 33:18
        goto -> bb5;                     // scope 1 at enums.rs:33:17: 33:18
    }

    bb2: {
        falseEdge -> [real: bb4, imaginary: bb1]; // scope 1 at enums.rs:32:9: 32:16
    }

    bb3: {
        unreachable;                     // scope 1 at enums.rs:31:20: 31:21
    }

    bb4: {
        StorageLive(_4);                 // scope 1 at enums.rs:32:14: 32:15
        _4 = ((_1 as Some).0: u32);      // scope 1 at enums.rs:32:14: 32:15
        _2 = _4;                         // scope 3 at enums.rs:32:20: 32:21
        StorageDead(_4);                 // scope 1 at enums.rs:32:20: 32:21
        goto -> bb5;                     // scope 1 at enums.rs:32:20: 32:21
    }

    bb5: {
        FakeRead(ForLet(None), _2);      // scope 1 at enums.rs:31:9: 31:11
        _0 = const ();                   // scope 0 at enums.rs:29:11: 35:2
        StorageDead(_2);                 // scope 1 at enums.rs:35:1: 35:2
        StorageDead(_1);                 // scope 0 at enums.rs:35:1: 35:2
        return;                          // scope 0 at enums.rs:35:2: 35:2
    }
}

As you can see, there is no StorageLive(_3) even though, the variable is assigned _3 = discriminant(_1);.

Based on your example, here is another example:

fn test(a: Option<u32>) -> u32 {
    let x = match a {
        Some(y) => y,
        None => 4,
    };
    x + 1
}

The generated mir:

fn test(_1: Option<u32>) -> u32 {
    debug a => _1;                       // in scope 0 at src/main.rs:5:9: 5:10
    let mut _0: u32;                     // return place in scope 0 at src/main.rs:5:28: 5:31
    let _2: u32;                         // in scope 0 at src/main.rs:6:9: 6:10
    let mut _3: isize;                   // in scope 0 at src/main.rs:7:9: 7:16
    let _4: u32;                         // in scope 0 at src/main.rs:7:14: 7:15
    let mut _5: u32;                     // in scope 0 at src/main.rs:10:5: 10:6
    scope 1 {
        debug x => _2;                   // in scope 1 at src/main.rs:6:9: 6:10
    }
    scope 2 {
        debug y => _4;                   // in scope 2 at src/main.rs:7:14: 7:15
    }

    bb0: {
        StorageLive(_2);                 // scope 0 at src/main.rs:6:9: 6:10
        _3 = discriminant(_1);           // scope 0 at src/main.rs:6:19: 6:20
        switchInt(move _3) -> [0_isize: bb1, 1_isize: bb3, otherwise: bb2]; // scope 0 at src/main.rs:6:13: 6:20
    }

    bb1: {
        _2 = const 4_u32;                // scope 0 at src/main.rs:8:17: 8:18
        goto -> bb4;                     // scope 0 at src/main.rs:8:17: 8:18
    }

    bb2: {
        unreachable;                     // scope 0 at src/main.rs:6:19: 6:20
    }

    bb3: {
        StorageLive(_4);                 // scope 0 at src/main.rs:7:14: 7:15
        _4 = ((_1 as Some).0: u32);      // scope 0 at src/main.rs:7:14: 7:15
        _2 = _4;                         // scope 2 at src/main.rs:7:20: 7:21
        StorageDead(_4);                 // scope 0 at src/main.rs:7:20: 7:21
        goto -> bb4;                     // scope 0 at src/main.rs:7:20: 7:21
    }

    bb4: {
        StorageLive(_5);                 // scope 1 at src/main.rs:10:5: 10:6
        _5 = _2;                         // scope 1 at src/main.rs:10:5: 10:6
        _0 = Add(move _5, const 1_u32);  // scope 1 at src/main.rs:10:5: 10:10
        StorageDead(_5);                 // scope 1 at src/main.rs:10:9: 10:10
        StorageDead(_2);                 // scope 0 at src/main.rs:11:1: 11:2
        return;                          // scope 0 at src/main.rs:11:2: 11:2
    }
}

There is no StorageLive(_0) too. Since _0 is return value, I guess the reason might be that the semantics/effect of StorageLive/StorageDead might be implicit for _0. Another reason might be that StorageLive/StorageDead are optional for a local.

That would be my guess. The _0 slot is "owned" by the caller, so they made it Live before calling the function, and you can't really "dead" it.

It make sense.

By taking a look at a refined example's LLVM IR and LangRef, I find llvm.lifetime.start/llvm.lifetime.end are for stack memory objects allocated from alloca, and not for general virtual registers.

The simplified LLVM IR:

; playground::main
; Function Attrs: nonlazybind uwtable
define internal void @_ZN10playground4main17hef0c7f6bf3b3bda7E() unnamed_addr #2 {
start:
  ... relevant to printf! ...
 
  %r2 = alloca i32, align 4
  %r1 = alloca i32, align 4
  %0 = bitcast i32* %r1 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0)
; call playground::test
  %1 = tail call fastcc i32 @_ZN10playground4test17ha00e641b5cdd1a4fE(i32 0, i32 undef)
  store i32 %1, i32* %r1, align 4
  %2 = bitcast i32* %r2 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %2)
; call playground::test
  %3 = tail call fastcc i32 @_ZN10playground4test17ha00e641b5cdd1a4fE(i32 1, i32 %1)
  store i32 %3, i32* %r2, align 4

  ... relevant to printf! ...

  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %2)
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0)
  ret void
}

; playground::test
define internal fastcc i32 @_ZN10playground4test17ha00e641b5cdd1a4fE(i32 %0, i32 %1) unnamed_addr #3 {
start:
  %switch = icmp eq i32 %0, 0
  %phi.bo = add i32 %1, 1
  %x.0 = select i1 %switch, i32 5, i32 %phi.bo
  ret i32 %x.0
}

Instruction flow of %r1 in main:

  %r1 = alloca i32, align 4
  %0 = bitcast i32* %r1 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0)
  ; call playground::test
  %1 = tail call fastcc i32 @_ZN10playground4test17ha00e641b5cdd1a4fE(i32 0, i32 undef)
  store i32 %1, i32* %r1, align 4
  ...
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0)

Now I start to wonder the difference between the stack model of MIR and LLVM IR, for example, local in MIR and virtual registers in LLVM IR.

Function parameters and the result are alive for the entire function body, so I do not find it surprising that there is no StorageLive/StorageDead for them. However, temporary variables used to store discriminants are alive only for certain blocks of the function, so it is surprising that there is no StorageLive/StorageDead for them.

I see your point.

All MIR locals are codegenned to allocas in LLVM. Then mem2reg in LLVM converts them to LLVM's virtual registers where possible.

1 Like

Get it. Thanks!

Besides that one (and the issues it references), here are some other related issues

3 Likes