Inline assembly syntax

I’m currently playing with the idea of removing string literals and replacing them with regular tokens which I stringify in a syntax extension. This has 3 main benefits:

  • It’s more lightweight and looks more integrated.
  • Input operands can be declared and used in one place.
  • It’s easier to write macros with inline assembly depending on arguments to the macro.

Here’s an example of what it could look like:

asm! {
	cli;
	swapgs;
	mov ds, {a = user_segment};
	mov {r = &thread.registers.rip};
	iretq;
}

Note that the semicolons will expand to newlines in the generated assembly.

I also suggest that an asm! macro could return it’s outputs instead of mutating existing variables.

let ret = asm! {
	inw {a=>}, {dN = port};
};

I’m not sure what syntax would be desired for all the options or how it should be escaped in the middle of the assembly. I’m looking for input on that. However to make parsing easier I suggest to place options in matches braces after the opening brace.

let mut result: u32;

asm! {
	[b = arg, b -> r2, -eax, :name:r, alignstack, pure, intel]

	jmp label;
label:
	mov :name, {r = 1 + 2};
	mov {r -> result}, :name;
}

A possibility is also to put outputs at the end using the escape syntax:

let (eax, ebx) = asm! {
	{a = 1, b = 3}

	add eax, 1;
	add ebx, 1;

	{= (a, b)}
};

I think we should support this syntax for major architectures (ARM and x86) and allow string literals in case Rust’s lexer or our escape method isn’t compatible with some assembly dialect. Note that ARM will require another escape syntax than curly braces.

Inline assembly should have implicit side effects by default which you can opt out of using the pure attribute. Intel syntax should also be default on x86. The syntax to use on x86 should be added as a crate attribute since people using AT&T syntax aren’t extinct yet.

For reference:

5 Likes

The idea is certainly interesting, and quite expressive. I have a few questions though:

  • Does this now require the Rust compiler to understand assembly language tokens?
  • Would there be benefits in instead removing inline assembly and allowing raw IR to be emitted, which could perhaps reference the relevant variables? (this is a genuine thing I am curious about - my understanding of the IR and this phase of Rust compilation is not good enough to say whether this would even work).
  • What stops the compiler from removing inline assembly marked pure without side effects? Or to put it differently, if there are no side effects, what purpose does the assembly statement actually serve?
  • It does not require the Rust compiler to understand assembly language, it just reuses Rust tokens.
  • LLVM IR uses an similar syntax to the existing one, so there would be no gain there.
  • Inline assembly marked pure must have outputs or mutate references given. Such inline assembly could be considered purely an optimization. If the compiler finds that the mutated references or outputs are unused, it can safely remove the assembly too.

Inline assembly previously reminded me of closures. The last example is even closer to the closure syntax.

let inc2 = asm! { |a, b|
	add eax, 1;
	add ebx, 1;

	{= (a, b)}
};
let (eax, ebx)  = inc2(1, 3);

I completely agree on the pure attribute, AT&T syntax and escaping.

it just reuses Rust tokens.

Observe that we're borrowing concepts from both stringify! and format! extensions.

looks more integrated.

This might be a disadvantage. It's not integrated in reality like D lang's assembly is.

Is there any chance that inline assembly needs to have unbalanced delimiters (([{)? If there’s even an edge case where that’s required, that would prevent this from working.

Nowhere that I know of. Invalid string literals are more likely.

ARM uses curly braces. Some other delimiters wouldn’t require escaping. Angle brackets aren’t used in inline assembly at all.

I don’t want to use angle brackets, since you’d need another escape for Rust expressions then.

I’ve played a bit with a potential syntax:

// Inline labels
asm! {
loop:
	add eax, 1;
	jmp loop;
}

// Input constraint
asm! {
	[val => %a]

	add eax, 1;
}

// Output constraint
asm! {
	[%a => val]

	inw eax, 0x40;
}

// Input and output constraint
asm! {
	[val <=> %a]

	add eax, 1;
}

// Input then output constraint
asm! {
	[input => %a => output]

	add eax, 1;
}

// Named parameter
asm! {
	[let name: %r]

	add name, 1;
}

// Named parameter with input
asm! {
	[4 => let name: %r]

	add name, 1;
}

// Named parameter with input then output
asm! {
	[4 => let name: %r => val]

	add name, 1;
}

// Named parameter with input and output
asm! {
	[val <=> let name: %r]

	add name, 1;
}

// Inline input
asm! {
	add eax, {val => %r}, 1;
}

// Inline output
asm! {
	inw {%r => val}, 0x40;
}

// Inline input and output
asm! {
	shr {%r <=> val}, 3;
}

// Inline input then output
asm! {
	shr {input => %r => output}, 3;
}

// Clobbers
asm! {
	[use eax, use memory]

	xor eax, eax;
}

// Options
asm! {
	[mod alignstack, mod attsyntax, mod pure]

	xor eax, eax;
}

// Bound outputs
let val = asm! {
	[%a => return]
	mov rax, cr2;
};

// Inline bound outputs
let val = asm! {
	mov {%a => return}, cr2;
};

// Multiple bound outputs
let (rbx, rax) = asm! {
	[%b => return]
	xor rbx, rbx;
	mov {%a => return}, cr2;
};
2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.