Any time you're measuring perf: absolute measurements don't really matter, what matters is relative changes in measurements.
(instcount measurements flatten this rule somewhat, but I think it still applies.)
So if your local measurements have the same "shape" as the reference measurements (that is, relative measurements between benchmarks are roughly the same), I wouldn't worry about the absolute measurements being different locally. Instead, capture your reference baseline locally, then compare against that.
perf.rust-lang.org benchmarks builds produced by rustc's CI, specifically the x86_64-unknown-linux-gnu target. For that target, our CI currently does several things that a local build may not match (and likely won't by default); I think this is a mostly complete list:
PGO for rustc
ThinLTO + PGO for LLVM (if you use download-ci-llvm = true on x86_64-unknown-linux-gnu, you likely get most of the benefits here)
std is built with codegen-units=1
All of these will definitely make perf's instruction counts and absolute numbers differ from what you see locally. I wouldn't try to reproduce the above locally -- local benchmarking, particularly e.g. with cachegrind (which is less sensitive to environmental differences and noise), should give a fairly decent proxy for what you'll see as a relative change on perf. This is not always true -- for example, PGO can mean that your loop/condition reordering or whatever was already applied by LLVM -- but in the general case, locally you should be able to reproduce results fairly well. If you can't, we may not be able to do anything but we'd like to hear about it -- feel free to drop by #t-compiler/performance on Zulip and ask questions if something isn't working as you expect.