Generic `Atomic<T>`

I would describe it as something like a CAS version of a SIMD masked load on a u8 vector (I think we only need byte-level precision) -- we never even load the other bytes so if they are uninit it doesn't matter. That avoids having to talk about freeze.

However, even then the implementation as a loop does not have the same liveness properties as the AM semantics: if one thread tries to do such a CAS while another thread constantly does atomic writes with random values to the same memory, then in the AM the first thread would be guaranteed to terminate. However, a loop-based implementation (even if we only introduce the loop in machine IR, entirely avoiding all optimization-related questions) could make the first thread loop forever.

IOW, this pseudocode would always terminate under AM semantics (assuming a fair scheduler), but the produced binary could fail to terminate:

static X;
static FLAG;

thread::scope(|s| {
  s.spawn(|| {
    do_a_masked_cas(&X);
    set_flag(&FLAG);
  });
  s.spawn(|| {
    while !get_flag(&FLAG) {
      write_random(&X);
    }
  });
});

So no, this does not solve the liveness concerns.

Also, a static mask indicating which bytes to compare and which bytes to ignore would not suffice to permit CAS on enums with fields, as for them the padding mask depends on the enum discriminant.