[Pre-RFC #2]: Inline assembly

gnzlbg · December 5, 2019, 4:09pm

We already have an intrinsics for this, so inline assembly is not necessary to achieve that. Do you have any other use cases?

"the contents of the string literal must be provided to the underlying platform verbatim[...]"

Instead of verbatim we probably need to say something about interpolation here, e.g., we don't want passing "mov eax, {}" to the assembler, but "mov eax, eax" or whatever {} gets interpolated to.

"[...]upon control reaching the assembly block, the underlying platform must be instructed to reach that verbatim string"

Do we also need to mention how the Rust program continues its execution if the assembly snippet finishes?

mcy · December 5, 2019, 4:20pm

Do we? I think trying to optimize empty assembly out by default is still kind of a violation of expectations.

I mean, probably, but I think trying to work out what all the standardsese reads like right now is kind of overkill. There's a lot of terms I used in that stanza that need defining anyway.

gnzlbg · December 5, 2019, 5:41pm

github.com/rust-lang/rust

Tracking issue for `std::hint::black_box`

opened 08:44PM - 02 Sep 19 UTC

closed 03:11PM - 07 Dec 22 UTC

Centril

B-RFC-approved T-libs-api T-compiler C-tracking-issue disposition-merge finished-final-comment-period Libs-Tracked

This is a tracking issue for the RFC `std::hint:_black_box`. Original RFC: [R…FC 2360](https://rust-lang.github.io/rfcs/2360-bench-black-box.html) **Public API:** ```rust // std::hint pub fn black_box<T>(dummy: T) -> T; ``` **Steps:** - [x] Implementation - [x] FCP - [ ] Stabilization PR **Unresolved questions:** - [ ] `const fn`: it is unclear whether `bench_black_box` should be a `const fn`. If it were, that would hint that it cannot have any side-effects, or that it cannot do anything that `const fn`s cannot do. - [ ] Naming: during the RFC discussion it was unclear whether `black_box` is the right name for this primitive but we settled on `bench_black_box` for the time being. We should resolve the naming before stabilization. Also, we might want to add other benchmarking hints in the future, like `bench_input` and `bench_output`, so we might want to put all of this into a `bench` sub-module within the `core::hint` module. That might be a good place to explain how the benchmarking hints should be used holistically. Some arguments in favor or against using "black box" are that: * pro: [black box] is a common term in computer programming, that conveys that nothing can be assumed about it except for its inputs and outputs. con: [black box] often hints that the function has no side-effects, but this is not something that can be assumed about this API. * con: `_box` has nothing to do with `Box` or `box`-syntax, which might be confusing Alternative names suggested: `pessimize`, `unoptimize`, `unprocessed`, `unknown`, `do_not_optimize` (Google Benchmark).

I think trying to optimize empty assembly out by default is still kind of a violation of expectations.

I don't disagree, just wondering if there are more use cases that support this.

Amanieu · December 5, 2019, 5:44pm

An empty inline asm actually has well defined semantics and cannot be optimized away (unless you explicitly allow it with pure).

The semantics are that at some point in the program's execution, the program will have all the registers declared as inputs containing the required values, and all globals that may be accessed by the asm will contain their proper values (unless nomem is used).

Amanieu · December 11, 2019, 8:00pm

I would like to thank everyone here for their constructive feedback!

Discussion on this RFC will now be moved to the inline assembly project group, which also holds the latest RFC draft. You are all welcome to join us on the Rust Zulip channel to work on improving the RFC or discuss any issues related to inline assembly.

luojia · December 13, 2019, 3:32am

Here for example x86 has 'reg' for integer registers and 'reg_abcd' for four specific ones. Would it be possible to define these register class names ('reg_abcd' and 'reg') as symbols into the standard library of its architecture? In this way we may define symbol std::arch::x86_64::reg_abcd etc (maybe save two quote marks "" when used in macros) and they are only available under for example x86_64 target config. Additional register classes maybe only available for certain target configs, for example vector register classes for architecture RISC-V with vector V Extension, and r15-r31 registers for 64-bit x86_64 architecture other than 32-bit x86. Users may define aliases under reg_abcd for convenience, and look up for docs easily in standard library docs. (just random thought, it's maybe still good to hard-code class names as strings into compiler or macro syntax. Undone for coordinating lower bits of registers e.g. ax, eax, rax)

Some considerations on syntax:

For registers provided ('linked' externally) by compiler, use extern:

decl_register_class! { pub extern rax; }
decl_register_class! { pub extern x5; }

To declare a group of registers into a class, for example we use []:

decl_register_class! { pub reg_abcd = [rax, rbx, rcx, rdx]; }
decl_register_class! { pub reg_8_bit = [al, ah, bl, bh, cl, ch, dl, dh]; }

Declare aliases for a register class. By this way we save code for compilers and retain the ability to extend:

decl_register_class! { pub four = reg_abcd; } // frequently used in x86
decl_register_class! { pub t0 = x5; } // template value 0, RISC-V register

Use #[cfg(...)] to provide symbols for certain configurations esp. targets:

// for one target only, not for another one
#[cfg(any(target = "x86", target = "x86_64"))] // available for 32 & 64 bits
decl_register_class! { pub extern xmm7; } 
#[cfg(target = "x86_64")] // for 64-bit only, not available in 32-bit
decl_register_class! { pub extern xmm31; }

// use same name `reg` for all integers in different platforms
// so that users may remember register class names easily
decl_register_class! { 
    #[cfg(target = "x86_64")]
    pub reg = [rax, rbx, rcx, rdx, /* omitted */, r14, r15];
    #[cfg(target = "riscv")]
    pub reg = [x0, x1, x2, x3, /* omitted */, x30, x31];
}

// In RISC-V, these registers are not allowed to change by applications with 
// an underlying OS exists. However we can modify it in bare-metal embedded 
// systems in `no_std` context or when we are developing the underlying OS.
#[cfg(no_std)] 
decl_register_class! { 
    pub gp = x3; 
    pub tp = x4;
}

Metadata are allowed for declarations:

decl_register_class! {
    #[doc = "RISC-V x5 register"] // metadata are allowed
    pub extern x5;
}

Eventually we could reach this:

// or other keywords/types instead of this example
pub static reg_abcd: Reg<64> = [rax, rbx, rcx, rdx];
pub static a: Reg<[64, 32, 16, 8]> = [rax, eax, ax, ah, al];

Compiler may pick one register when some variant has core::mem::size_of equals one of cpu register length. For example we give an i32 as size_of::<i32>() == 4 bytes (32 bits) and we give it an a as register class, then compiler would pick from [64, 32, 16, 8], finally chooses 32 as result, picking an eax. Or providing my own type Option<NonNull<T>> it would give a 64 as result picking rax. This may work and adapt perfectly for 80-bit wide float point number registers in some architectures.

Maybe my approach could cooperate with this idea (we may change the declaration of with_clobbers, and use MaybeUninit::uninit for value cpuid before calling assembly code):

197g:

// This is just a builder/descriptor. Real magic happens when this is 
// used as a const parameter to the intrinsic, see below. 
const ASM: Assembly = Assembly::new() 
    // Target-arch specific set of register clobbers, available like SIMD 
    .with_clobbers(&[Reg::EAX, Reg::EBX, Reg::ECX, Reg::EDX])) 
    // Request one input register, referenced by index 0. Adds memory to 
    // clobbers due to mutable reference? Maybe a worthwhile idea. 
    .with_input::<&mut Cpuid>(0) 
    // Request the compiler internal assembler. 
    .from_source(CPUID_SOURCE);

fn cpuid() -> Cpuid {
    let mut cpuid = Cpuid::default();
    intrinsic::call_asm::<{ASM}>(&mut cpuid);
    cpuid
}

gnzlbg · December 13, 2019, 12:08pm

How does asm!("", nomem, preserves_flags);, constrain what the compiler does around the inline assembly block? To me it looks like it can just be removed, even though it isn't pure.

Amanieu · December 13, 2019, 3:15pm

You're right that this asm code imposes no constraints on the compiler, except for the fact that the compiler (currently) does not look at the asm string itself and does not "know" that it is empty.

Amanieu · December 13, 2019, 3:28pm

luojia:

Here for example x86 has 'reg' for integer registers and 'reg_abcd' for four specific ones. Would it be possible to define these register class names ('reg_abcd' and 'reg') as symbols into the standard library of its architecture? In this way we may define symbol std::arch::x86_64::reg_abcd etc (maybe save two quote marks "" when used in macros) and they are only available under for example x86_64 target config. Additional register classes maybe only available for certain target configs, for example vector register classes for architecture RISC-V with vector V Extension, and r15-r31 registers for 64-bit x86_64 architecture other than 32-bit x86. Users may define aliases under reg_abcd for convenience, and look up for docs easily in standard library docs. (just random thought, it's maybe still good to hard-code class names as strings into compiler or macro syntax. Undone for coordinating lower bits of registers e.g. ax, eax, rax)

Unfortunately we are pretty constrainted by the fact that in the end, we need to lower to LLVM's internal inline assembly syntax, which only supports a few hardcoded register classes. This means that we can't really support users specifying their own custom register classes.

As such, I don't see much benefit in declaring register classes as symbols, and it will significantly increase implementation complexity.

comex · December 14, 2019, 1:23am

Indeed. Without inspecting the asm string, which I believe the compiler should not be allowed to do, it cannot remove that asm block, because it could have some sort of side effect... albeit one that does not modify any memory, registers, or flags. For example, some architectures have dedicated instructions to write to I/O ports. Alternately, storing to addresses corresponding to MMIO registers should be okay even with nomem, because they're not really "memory" in a sense that the compiler cares about. (In particular, all source-level loads and stores to those addresses should be volatile anyway, so the compiler can never remove or alter them based on its analysis of what does or does not touch memory.)

Edit: Another example of an allowed side effect would be trapping.

But it doesn't constrain the compiler per se. There's no particular state that the compiler has to flush or set up – not registers, not even global variables, because nomem implies the assembly shouldn't access global variables.

josh · December 15, 2019, 4:22pm

If it isn't excessively difficult, we may need to provide this in the initial version. People will need labels in inline assembly, and if we have this mechanism, we can encourage writing such labels correctly from the start.

EDIT: No, we don't need to provide this in the initial version; as @Amanieu points out below, people can and should use local labels instead.

josh · December 15, 2019, 4:30pm

We need to document handling of the alignstack mechanism within LLVM.

As with the safe-by-default handling of out vs lateout and preserves_flags, I would suggest that we always pass the alignstack flag to LLVM, and that we optionally provide a noalignstack option to say "this assembly doesn't need an aligned stack (e.g. because it doesn't call other functions and doesn't use SSE operations that require alignment)".

I also don't know to what degree we really need such a flag, and to what extent that allows LLVM to make optimizations that it otherwise couldn't. We may choose to not provide such a flag in the initial version. But at a minimum, we need to document that we set the alignstack option in LLVM by default.

Amanieu · December 15, 2019, 5:41pm

Actually that's not strictly needed since you can use local labels which don't need symbol names.

I'm somewhat torn about this since this feature is only supported by LLVM, not GCC, which could cause issues if we decided to add GCC as a backend (which someone is already trying to do).

However without this feature it is impossible to call a function from inline asm. It's not just about stack alignment: on x86_64 leaf functions don't need to adjust the stack pointer on entry if their stack usage fits in the 128-byte stack red zone. However if some inline asm calls an external function, the contents of the red zone will be corrupted by the call instruction and any other stack space used by the called function.

josh · December 15, 2019, 9:15pm

Good point; I keep forgetting that we can guarantee the same assembly syntax on different targets for the same architecture, thanks to LLVM's built-in assembler. I've had to write assembly that can't make that assumption in the past.

As far as I know, GCC supports making calls from inline assembly; I'd find it quite surprising if it did not.

I would propose that we default to supporting calls and SSE, and allow assembly blocks to opt out of that if they want the additional optimization. That seems safer.

CAD97 · December 15, 2019, 9:49pm

This Q/A uses gcc inline assembly.

Some pertinent quotes:

Both Michael and I have listed a number of reasons doing call in inline asm is difficult.

Handling all the registers that may be clobbered by the function call's ABI.

Handling red-zone.

Handling alignment.

Memory clobber.

If the goal here is 'learning,' then feel free to experiment. But I don't know that I would ever feel comfortable doing this in production code.

-- @David Wohlferd

But there are other considerations as well, such as the red-zone for 64bit code. This means that push/pop (the traditional method of 'restoring' registers) is more complicated than usual. And even though rax isn't explicitly mentioned in this code, it is subject to change by printf or any of its children, so it must be 'clobbered' as well. In addition to r8, r10, etc. (Safely) calling functions from inline asm is hard , and is usually a bad idea.

-- @David Wohlferd

I concur with @DavidWohlferd :, calling functions from inline assembler requires a fair amount of knowledge. I wrote an answer that wasn't very trivial recently that involved 64-bit code/inline assembler/calling a function. On top of what David said GCC itself requires the stack to be aligned to a 16-byte boundary at the point a CALL is made. So not only do you need to deal with the redzone and clobbers, you need to deal with stack alignment before the call.

-- @Michael Petch

My reading of the linked discussion is that while GCC technically, probably, supports CALL from inline ASM, getting it right is next to impossible and mostly undocumented.

At a minimum, you'd need alignstack, clobbers(all temporary registers), clobbers(memory), clobbers(flags), and clobbers(red zone).

Here is a simple example of calling printf twice with inline asm by Michael Petch:

int main()
{
    const char* test = "test\n";
    long dummyreg; /* dummyreg used to allow GCC to pick available register */

    __asm__ __volatile__ (
        "add $-128, %%rsp\n\t"   /* Skip the current redzone */
        "mov %%rsp, %[temp]\n\t" /* Copy RSP to available register */
        "and $-16, %%rsp\n\t"    /* Align stack to 16-byte boundary */
        "mov %[test], %%rdi\n\t" /* RDI is address of string */
        "xor %%eax, %%eax\n\t"   /* Variadic function set AL. This case 0 */
        "call printf\n\t"
        "mov %[test], %%rdi\n\t" /* RDI is address of string again */
        "xor %%eax, %%eax\n\t"   /* Variadic function set AL. This case 0 */
        "call printf\n\t"
        "mov %[temp], %%rsp\n\t" /* Restore RSP */
        "sub $-128, %%rsp\n\t"   /* Add 128 to RSP to restore to orig */
        :  [temp]"=&r"(dummyreg) /* Allow GCC to pick available output register. Modified
                                    before all inputs consumed so use & for early clobber*/
        :  [test]"r"(test),      /* Choose available register as input operand */
           "m"(test)             /* Dummy constraint to make sure test array
                                    is fully realized in memory before inline
                                    assembly is executed */
        : "rax", "rcx", "rdx", "rsi", "rdi", "r8", "r9", "r10", "r11",
          "xmm0","xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7",
          "xmm8","xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15",
          "mm0","mm1", "mm2", "mm3", "mm4", "mm5", "mm6", "mm6",
          "st", "st(1)", "st(2)", "st(3)", "st(4)", "st(5)", "st(6)", "st(7)"
        );

    return 0;
}

This is not a simple problem, and honestly, I feel like trying to make it appear simpler is going to miss some edge case and cause more problems by giving an appearance of it "just working". With the ceremony required to do this, a solution that moves a function pointer into a variable and then calls it with normal surface syntax is almost certainly going to win out as it can omit all (most) of this ceremony.

We should definitely make our defaults as safe as possible, but I feel fully OK with saying that a call from inline ASM is out of scope of the initial specification, because it's just that thorny of an issue.

But this does bring up the important question of the stack. Upon entry to the inline ASM, where do we guarantee the stack pointer to be? Is the inline asm allowed to use the red zone as scratch space? Is it allowed to increase the stack pointer to grab more stack space so long as it's popped back by the exit?

CAD97 · December 15, 2019, 10:03pm

Following the docs trail to GCC Basic Asm docs:

Safely accessing C data and calling functions from basic asm is more complex than it may appear. To access C data, it is better to use extended asm .

Following the trail to GCC Extended Asm docs:

Accessing data from C programs without using input/output operands (such as by using global symbols directly from the assembler template) may not work as expected. Similarly, calling functions directly from an assembler template requires a detailed understanding of the target assembler and ABI.

No other instruction about calling functions from inline asm is provided in these two documents.

I should probably also link Don't Use Inline Asm.

josh · December 16, 2019, 12:08am

As far as I know, everything in those two quotes refers to the idea of trying to reference a function or variable symbol directly from inline assembly, rather than passing a value in via input/output operands. That doesn't directly relate to safely making the call itself, just to naming the thing you want to call.

josh · December 16, 2019, 12:41am

All very good questions!

I don't think we can make any precise guarantee about where the stack pointer lives upon entry, because the surrounding code may have moved the stack pointer.

(In the future, if we offer memory operands and use them to reference things on the local stack, the compiler needs to ensure that those memory operands work upon entry, but if you change the stack it might invalidate the memory operands, depending on what they offset from.)

Using the red zone seems dangerous, as the compiler might also have used the red zone. As far as I can tell, I don't see any obvious way to specify a clobber of the red zone (unless perhaps alignstack has that as a side effect, but that isn't documented).

One way or another we should specify this. Ideally it should be possible to use the red zone for scratch space if enabled, but on the other hand this doesn't come up especially often in inline assembly, and it isn't obvious to what degree using it would cause enough overhead in the compiler to make it no longer worthwhile.

(Also, the wildest and most difficult-to-reproduce bug I ever debugged involved code that briefly used the red zone running in a context that didn't preserve the red zone on interrupts. So at the very least, I would argue that any code using the red zone should have to very loudly declare that it does so, and then the compiler can error if compiling such code with -mno-red-zone.)

I've seen a lot of inline assembly code that pushes and pops, but much of it occurs in projects that use -mno-red-zone. So I don't know whether doing that in code compiled with the red zone would work correctly or not, and I haven't seen any documentation specifying that interaction for either LLVM or GCC.

Digging into the source of LLVM, it looks like alignstack might have the desired side effect, but I haven't seen any documentation of that.

CAD97 · December 16, 2019, 1:06am

Note that I was talking about where it points, not where the actual pointer lives (though that's a good point as well).

If the stack pointer isn't guaranteed to be in the stack pointer register, well, the inline asm just straight up can't find the stack, let alone use it.

If the stack pointer is guaranteed to be beyond any locals (iow, the red zone is not in use), then the inline asm can safely push/pop (well, modulo alignment issues).

If the stack pointer is guaranteed to exist but there may be stack items beyond the stack pointer (iow, push would clobber the stack) (iow, the red zone is in use), then the inline asm can do the "skip red zone and align" dance reproduced above to use the stack.

The second (roughly alignstack iiuc) is the "safest" option, as push/pop will "just work". The first is obviously the most freeing to the compiler. The third seems the most likely for it to end up being if not otherwise specified.

(I made a tracking issue for the project group.)

Amanieu · December 16, 2019, 1:23am

Thanks for creating an issue, let's continue this discussion there.

Topic		Replies	Views
[Pre-RFC]: Inline assembly language design	70	14204	March 25, 2019
Stabilization path for asm!()? language design	11	3320	March 25, 2019
Older RFCs for discussion this week	9	1657	March 25, 2019
This week's older RFCs	3	1235	March 25, 2019
Next week's older RFCs for discussion	8	2161	March 25, 2019

[Pre-RFC #2]: Inline assembly

Related topics