First of all I'm not quite sure if this feature already exists. I tried to figure it out by combing through the generated assembler and LLVM IR and didn't find it. Although my x86_64 assembler is not very good and my LLVM IR knowledge even worse.
I was doing software architecture for a framework I'm writing and maybe discovered an opportunity for performance improvements in Rust.
It seems to be a common pattern in Rust to use enums to avoid dynamic trait objects when returning an object and match the enum afterwords (even in the next use), destroying it to get to the contents and perform something with them.
Rust Example
#![allow(dead_code)]
#![feature(bench_black_box)]
use std::hint::black_box;
enum Variants {
Usize(usize),
U64(u64),
U32(u32),
U16(u16),
U8(u8),
}
enum InVariant {
Usize,
U64,
U32,
U16,
U8,
}
#[inline(always)]
fn mapping_in_to_out(input: InVariant) -> Variants {
match input {
InVariant::Usize => {
Variants::Usize(usize::MAX)
}
InVariant::U64 => Variants::U64(u64::MAX),
InVariant::U32 => Variants::U32(u32::MAX),
InVariant::U16 => Variants::U16(u16::MAX),
InVariant::U8 => Variants::U8(u8::MAX),
}
}
fn perform_complete(input: InVariant) {
let variants = mapping_in_to_out(input);
unsafe { calculating_out(variants); }
}
#[inline(always)]
unsafe fn calculating_out(input: Variants) {
match input {
Variants::Usize(d) => {
let e = d - 10;
println!("{}", e);
}
Variants::U64(d) => {
let e = d - 10;
println!("{}", e);
}
Variants::U32(d) => {
let e = d - 10;
println!("{}", e);
}
Variants::U16(d) => {
let e = d - 10;
println!("{}", e);
}
Variants::U8(d) => {
let e = d - 10;
println!("{}", e);
}
}
}
fn main() {
let in_variant = InVariant::Usize;
perform_complete(black_box(in_variant));
}
Possible optimization opportunity
From the code sample one can clearly see, that it's not necessary to calculate the intermediate Variants
enum and avoid the second branching all together, by just jumping into the respective match clause.
That's it.
Sample / Benchmark C implementation
I've rewritten the Rust sample in C and employed goto
to perform the proposed optimization.
Whole Benchmark in the Compiler Explorer
Here is the relevant code from the C implementation
void constructVariantASM(enum NumInVariantEnum input) {
uint8_t uint8_variant_val;
uint16_t uint16_variant_val;
uint32_t uint32_variant_val;
uint64_t uint64_variant_val;
struct NumVariant ret;
switch (input) {
case u8In: {
uint8_variant_val = UINT8_MAX;
goto uint8_variant;
ret.type = u8;
ret.data.u8 = UINT8_MAX;
break;
}
case u16In: {
uint16_variant_val = UINT16_MAX;
goto uint16_variant;
ret.type = u16;
ret.data.u16 = UINT16_MAX;
break;
}
case u32In: {
uint32_variant_val = UINT32_MAX;
goto uint32_variant;
ret.type = u32;
ret.data.u32 = UINT32_MAX;
break;
}
case u64In: {
uint64_variant_val = UINT64_MAX;
goto uint64_variant;
ret.type = u64;
ret.data.u64 = UINT64_MAX;
}
}
switch (ret.type) {
case u8: {
uint8_t structData = ret.data.u8;
uint8_variant:
structData = uint8_variant_val;
structData -= uint8_one;
#ifdef PRINT_OUPUT
printf("%i\n", structData);
#endif
break;
}
case u16: {
uint16_t structData = ret.data.u16;
uint16_variant:
structData = uint16_variant_val;
structData -= uint16_one;
#ifdef PRINT_OUPUT
printf("%i\n", structData);
#endif
break;
}
case u32: {
uint32_t structData = ret.data.u32;
uint32_variant:
structData = uint32_variant_val;
structData -= uint32_one;
#ifdef PRINT_OUPUT
printf("%u\n", structData);
#endif
break;
}
case u64: {
uint64_t structData = ret.data.u64;
uint64_variant:
structData = uint64_variant_val;
structData -= uint64_one;
#ifdef PRINT_OUPUT
printf("%llu\n", structData);
#endif
break;
}
}
}
Bench Results
I've run the benchmark and got the following results:
Unoptimized took 1.477011 seconds
Optimized took 1.443288 seconds
Granted, the benefit is really small, it might be bigger in more complex code.
Resume
I just thought about this opportunity and would be interested in comments by people with more expertise and knowledge in Rust and compiler design. Sadly I don't have the time, resources or knowledge to push any efforts here.
Benchmark machine
OS: macOS 12.3.1 21E258 x86_64 Host: MacBookAir7,2 CPU: Intel i5-5350U (4) @ 1.80 GHz GPU: Intel HD Graphics 6000 Memory: 8192 MiB
Thanks for your time and interest