I wonder if PMUs provide some counters that are stable like "number of retired instructions" (not affected by other processes touching caches), but more precise, like "number of retired (uop * static_weight_in_cycles(uop))
s". Need to check.
Update: There's only unweighted UOPS_RETIRED.ALL
(and a bunch of finer-grained counters for specific kinds of instructions).
(Still feels wrong to measure that though, would be preferable to use cycles and try eliminating "other processes" on the perf machine instead.)