Diverging function call optimization

Since a diverging function never returns, a call to such should emit a jump instead of callq.

See Playground.

Such an optimization happens in simple circumstances when the diverging function loop {}-s. However, the optimization should also be applied when eg. the diverging function recurses. Generally, every call to diverging function should just be a jump.

1 Like

A diverging function can still unwind. I’m not entirely sure how exactly unwinding works on a technical level, but I’m certain you’ll need the stack for it, including the back-pointers from a call instruction.

I couldn’t figure out how to achieve this at all, do you mind sharing the code that did it for you? In particular, how would you achieve that the called function isn’t simply inlined in which case the jump could be part of the loop {} itself?


That won’t work due to unwinding, as explained above; however it could be an option for functions for which the compiler figures out that they never unwind. I’m not sure if this is something that rustc itself can reasonably implement, perhaps an optimization like this would have to be implemented in LLVM instead. Also it’s questionable how relevant this “optimization” is after all. As far as I can tell, we’re talking about a call instruction that will only be executed once in the entire runtime of the program; seems not really worth any optimization in terms of speed; and saving one back pointer worth of stack space, well…, it’s something bit it isn’t much.

7 Likes

#[inline(always)] might be all you need.

What do you mean, could you elaborate?

Isn't this just a special case of tail call optimization?

1 Like

They're already marked as tail call in the LLVM we emit: https://rust.godbolt.org/z/dT71db7GW

But even with -Z trap-unreachable=no it's still just emitting a call.

1 Like

You're right, on #[inline(never)] emits callq.

I'm coming to this from the standpoint of making a kernel. There, diverging funtions are used for many tasks, and I found myself wanting to (cyclically) call such functions representing control flow of the kernel. Since unwinding isn't an option anyway in such environment, I thought it'd be neat to have these calls be simple jumps.

1 Like

Well, every call to a diverging function is a tail call, so yes (but here, compiler doesn't even need to be smart about dead code).

Hmm, that IR fed to llc --tailcallopt does work:

        .type   _ZN7example1b17ha57f813186beeb63E,@function # -- Begin function _ZN7example1b17ha57f813186beeb63E
_ZN7example1b17ha57f813186beeb63E:      # @_ZN7example1b17ha57f813186beeb63E
.Lfunc_begin1:
        .loc    1 7 0                           # /app/example.rs:7:0
        .cfi_startproc
# %bb.0:
        pushq   %rax
        .cfi_def_cfa_offset 16
.Ltmp2:
        .loc    1 8 5 prologue_end              # /app/example.rs:8:5
        popq    %rax
        .cfi_def_cfa_offset 8
        jmp     _ZN7example1b17ha57f813186beeb63E # TAILCALL
.Ltmp3:
.Lfunc_end1:
        .size   _ZN7example1b17ha57f813186beeb63E, .Lfunc_end1-_ZN7example1b17ha57f813186beeb63E
        .cfi_endproc
                                        # -- End function
1 Like

That's only true if there's no drop glue for unwinding -- either nothing to drop or -Cpanic=abort.

1 Like
pub fn a() -> ! {
    b()
}
#[inline(always)]
fn b() -> ! {
    println!("do something.");
    c()
}
fn c() -> ! {
    c()
}

in assembly, there is no callq to b(b even does not exists)

The problem is that, the infinite tail recursion could not be optimized.

pub fn a() -> ! {
    b()
}
#[inline(always)]
fn b() -> ! {
    println!("do something.");
    loop{}
}

Hopefully one day we'll get opt-in guaranteed tail calls for Rust -- that's why the become keyword is reserved. But right now doing loops as recursion is usually a poor idea because it's not optimized in debug mode even when it'd work fine in release.

So if I understand correctly...

In a panic = "abort" environment, the IR generated correctly contains tail call.

However, the LLVM implementation rustc uses emits call nonetheless. On the other hand, feeding the IR to llc --tailcallopt produces jumps (but this is not used by rustc).