I think we got this discussion totally wrong anyway. Here’s my model of the optimization:
The optimization is to transform
S1:
tmp = SRC
...
S2:
DEST = tmp
to
S1:
DEST = SRC
I think this is best split into 4 steps. I don’t think we ever want to do the steps separately, but this clarifies the analysis rules.
Step 0 (original)
S1:
tmp = SRC
...
S2:
DEST = tmp
Step 1 - add additional dead write
S1:
tmp = SRC
DEST = tmp
...
S2:
DEST = tmp
This requires that DEST is dead at S1, and that it can be evaluated there (e.g. it is not a dereference of a pointer that is uninitialized there). Nothing else matters.
Step 2: common subexpression introduction
S1:
tmp = SRC
DEST = tmp
...
S2:
tmp2 = DEST
DEST = tmp2
This requires that the newly-added read links with the write of tmp - basically a “memdep” analysis.
This also requires that the address of DEST does not change, which is non-trivial because DEST can be the dereference of a pointer.
Step 3: remove write-of-read
S1:
tmp = SRC
DEST = tmp
...
S2:
NOP
After step 2, S2 is obviously a NOP and can be removed.
Step 4: remove tmp
S1:
DEST = SRC
...
S2:
NOP
This is purely a local operation, but may require some sophistication if e.g. SRC is a function call. I think we are always justified doing it, but we should make sure it is OK.