`leading_zeros()` has UB ... again?

According to the doc of x86 instruction bsr:

If the content source operand is 0, the content of the destination operand is undefined.

Let's try the following Rust code:

fn foo1(a: u32) -> u32 {
    a.leading_zeros()
}

fn foo2(a: u32) -> u32 {
    if a == 0 {
        return 32;
    }
    a.leading_zeros()
}

Both of the two functions compiles to the following assembly codes (in release mode, playground):

	mov	eax, 63
	bsr	eax, edi
	xor	eax, 31
	ret

It's obviously that when edi is zero, the value of eax is undefined after the bsr instruction, and thus the return value is undefined.

And I also tried this C code:

unsigned int foo(unsigned int a) {
    if (a == 0) {
        return 32;
    }
    return __builtin_clz(a);
}

And by GCC, the code is compiled to the following assembly code (with -O3, Compiler explorer):

foo:
        bsr     eax, edi
        mov     edx, 32
        xor     eax, 31
        test    edi, edi
        cmove   eax, edx
        ret

So GCC handles the corner case without any UB.

Unfortunately, clang also has this UB. So this seems related to LLVM. However, I'm not so familiar with LLVM, so could anyone look into this and find out what's wrong?

P.S. I found that there is a similar post in 2017, and what they said implies that, maybe there was no UB at that time?

3 Likes

Compiler explorer shows it having changed since clang 20.1.0.

Looking around the llvm-project PRs I found this: [X86] Handle BSF/BSR "zero-input pass through" behaviour, which was merged in llvmorg-20.1.0-rc1, and suggests that the behaviour was actually never undefined on x86-64 and Intel and AMD have documentation stating so.

7 Likes

Thank you for pointing out! It turns out that I refer to an outdated documentation😂

It is also worth nothing that "undefined" on the hardware level usually does not mean the same as "UB" in Rust. See for instance these docs.

6 Likes

My understanding is that the situation historically was (and I've verified this against older AMD and Intel documentation):

  • AMD documents BSR of 0 to leave the destination register unchanged;
  • Intel documents BSR of 0 to leave an undefined value in the destination register (and your link matches this because it's based on Intel documentation), but (on all current processors) actually implements it as leaving the destination register unchanged;
  • A very large number of existing programs rely on the AMD behavior, so if Intel decided to change the behavior, it would probably break a lot of existing programs and cause a lot of bad publicity.

As such, there was little risk in practice that an x86-64 processor would implement the instruction in any way other than the way specified by AMD, and the AMD specification of the instruction is fully defined (albeit weird), so there is in practice no undefined behavior involved.

Intel have since updated their documentation to make their specification more defined: it is now defined by Intel such that BSR of 0 leaves the bottom 32 bits of the destination register unchanged, and (depending on processor model and the width of the instruction) either leaves the remaining bits unchanged or zeroes them. (Given that the existing programs are normally trying to use the behaviour to specify a fallback value for BSR when the input is 0, and it would be unlikely for a value that was meant to be a number of bits to be outside the u32 range, the "unchanged or zeroed" choice is likely to be almost unobservable in practice unless you're specifically looking for it, which may be why nobody doing this noticed earlier.) I'm assuming that this quirk of old processors is the reason why Intel originally documented the behavior as undefined, but given that this construct is frequently used in practice, decided to just document the full behavior. (You can observe that the assembly code you listed works regardless of whether BSR of 0 leaves the top 32 bits of the destination unchanged or zeroes them.)

9 Likes

I didn't bisect all the way to the exact rust version, but if you check out https://rust.godbolt.org/z/snfr4PTW7 you'll see that LLVM used to emit the branch here, and thus older versions of rust did too.

And as an unimportant aside, the better way to write that is

fn foo2(a: u32) -> u32 {
    if let Some(a) = NonZero::new(a) {
        a.leading_zeros()
    } else {
        32
    }
}

which, admittedly, doesn't actually do anything different in this narrow example, but {leading|trailing}_zeros on NonZero get to tell LLVM that the value is never zero, allowing them to avoid worrying about this kind of "what happens on zero again?" issue when generating code.

7 Likes