[Pre-RFC] `.lo()` and `.hi()` methods for splitting integers into their halves

Summary

Add two methods, .lo() and .hi(), to all unsigned integer types except u8 and usize, which return the most (hi) or least (lo) significant half of an integer.

Motivation

Low-level code that interacts closely with hardware often needs to manipulate integers in specific ways. One of the most common operations is getting the low or high half of an integer. Currently, this can be accomplished by shifting and converting (masking) the integer:

let my_int: u32 = 0xABCDEF55;

let high: u16 = (my_int >> 16) as u16;
let low: u16 = my_int as u16;

To make this common operation shorter and easier to use, we can instead provide two simple methods on integers which do the same thing:

let my_int: u32 = 0xABCDEF55;

let high: u16 = my_int.hi();
let low: u16 = my_int.lo();

Detailed design

Two methods are added to u16, u32, u64 and u128. u8 is omitted because it can’t be split into smaller integers. usize is omitted because its size isn’t uniform on all architectures, so the result of the split would also be different on each architecture, possibly causing Rust code to compile on one architecture, but fail to compile on another.

The implementation in libcore could look like this:

impl u16 {
    fn lo(&self) -> u8 { *self as u8 }
    fn hi(&self) -> u8 { (*self >> 8) as u8 }
}

impl u32 {
    fn lo(&self) -> u16 { *self as u16 }
    fn hi(&self) -> u16 { (*self >> 16) as u16 }
}

impl u64 {
    fn lo(&self) -> u32 { *self as u32 }
    fn hi(&self) -> u32 { (*self >> 32) as u32 }
}

impl u128 {
    fn lo(&self) -> u64 { *self as u64 }
    fn hi(&self) -> u64 { (*self >> 64) as u64 }
}

Here is an implementation on the Rust playground using a LoHi trait instead of an inherent impl.

Drawbacks

The methods add a bit of API surface to core, which needs to be maintained and documented.

Alternatives

  • Continue the bit-fiddling like before

Unresolved questions

  • Should signed integers implement the methods, too?
  • Should usize also get a lo and hi method?
  • Should the methods be named low and high instead of lo and hi?
9 Likes

Why can’t this just be implemented in a crate?

7 Likes

For reference, here are a few places where this would’ve been useful in an emulator I wrote.

It can, of course. But importing an external crate and a trait (which you currently have to do) is more work than just doing .lo(), and considering how simple the bit-shift version is I doubt that many people would import a crate just for this. If it's in core, however, there isn't really any excuse not to use them :slight_smile: (I also believe that these methods are pulling their weight for many users - see the links I posted above - so I think they deserve to be in core)

2 Likes

But those arguments are broadly true for all small methods.

For something to go in std (let alone core), it should be something that’s either impossible to do at a higher layer, or is so broadly useful that it’s worth freezing in stone forever and having to be maintained until at least the next major release.

If this is really so useful, then you should be able to put it into a crate, and then point at the long list of reverse dependencies, and impressive download stats. If nothing else, this gives you actual evidence that the functionality is desired.

I mean, I figured people would love conv. It’s the sort of thing that should be in std! …but almost no one uses it, which demonstrates rather tragically that most people don’t care. Functionality that can be done in an external crate, and which most people don’t care about doesn’t deserve to be in std; it’d just be a burden on the core devs (for maintenance) and users (larger downloads).

12 Likes

Why is next_power_of_two() in the stdlib? Why is trailing_zeroes in the stdlib? Are they really broadly used?

I really don’t understand this level of stubbornness with external crates. Would adding hi() and lo() really be a maintenance burden? Are the additional hundreds of octets of download really relevant?

6 Likes

What I know however, is that if I needed hi() and lo() in a project, I’d put them in an utils.rs file that I’ll never touch again in my life. Maintenance burden: zero.

This is exactly what I did earlier today with the lerp() function. I put it alongside with the clamp() function that I’ve been using as well.

Because at some point, when it takes me more time to search for a crate that contains a function compared to writing it myself, I don’t even bother. If you add the time it takes to find the documentation of the function, and add it to your Cargo.toml and your main.rs, you have the time to write it ten times.

9 Likes

An argument for putting something in std is if something is so trivial that noone will bother to import an external crate for it. This definitely fits that bill. If I need the hi 16 bits of a u32 I’m not going to bother looking for an external crate that will give me that functionality, I’ll just use (x >> 16) as u16. But using .hi() might be slightly nicer.

3 Likes

Which is also why I think removing the functions that mapped &T -> &[T; 1] and &mut T -> &mut [T; 1] was such a bad idea. Especially since they require unsafe to implement.

4 Likes

Because they were added before 1.0 when the standard library was less picky and somehow survived the great batteries removal :slight_smile: It's unfortunate that libstd is in maintenance, stabilization and polishing status now and effectively frozen for new stuff. I suspect this somehow correlates to the number of people on the libs team who are not super buzy with other work (i.e. 0).

4 Likes

For trailing_zeroes the answer probably is “because it is implemented as LLVM intrinsic”.
Not sure about next_power_of_two, but I’d guess it’s because it is used by stdlib collections (to figure out the next bigger size).

7 Likes

TIL about conv !

2 Likes

I have two bitwise manipulation crates (both unfinished).

  • bitintr is supposed to be in std someday, because otherwise it won’t ever be usable in stable (it uses LLVM intrinsics directly, target feature, extern-intrinsic), just like the SIMD crate.

  • bitwise implements “higher level” bit manipulation algorithms, it will depend on bitintr for the low-level primitives.

FWIW since hi and lo aren’t “low level” (as in they don’t match to one asm instruction in any platform) and don’t depend on rustc/LLVM intrinsics, I would put them in bitwise, so I guess that means I think they belong outside of standard.

1 Like

They do on x86 in many cases, with effectively zero instructions: given a value you already have loaded into a register, access a smaller register component of that register and use that in the appropriate following instruction.

5 Likes

I like the general concept, and I don’t want to let the perfect be the enemy of the good here, but this feels like something better addressed via vector-ish/SIMD types.

I’d love to see Rust support types like “vector of 4 16-bit values stored in a 64-bit value”, or “vector of 8 16-bit values stored in a 128-bit value”. And given such support, many operations will make sense, including access to the individual components (or combinations of components).

3 Likes

with effectively zero instructions

TIL, thanks for pointing this out.

There are some algorithms like umul that map to umux which for 64 bit unsigned integer multiplication returns the higher and lower order bits in two different registers, but for accessing the loworder or highorder bits in a normal register I just thought that the common thing was to use a bitmask.

To implement this functionality in a nice way we need type level integers. This would allow adding nice bit-vectors to bitwise, and a better interface to the SIMD crate. I don't know whether it is worth it to move anything like that into std before that. I can see the advantages in that we could be using functionality like this right now with a less nice interface, and that adding a nicer interface later on is not a breaking change.

A problem with such utility functions in a create is that people have to be aware of the crate and the time they might need to look for the crate is around the time they need to implement it them self. Also a crate which bundles “a bunch of bit operations” might feel as quite a heavy import just for this hi/lo. Through I don’t think this on itself is a reason to include it into core. It might be interesting to open some kind of poll to check how many people actually do use this functionality in a project. (I would guess, at last, all kind of serialization, emulation, some-binary-format and embedded system crates)

Independent of whether or not adding the functionality is a good idea, I don’t like the names hi, lo. The names high, low or maybe something else would be preferable I think.

EDIT: Uh, why wasn’t I aware of conv before… :sweat_smile: Thanks for mentioning it.

I feel a trait is actually more useful than inherent impl’s for this, it extends the operations to code that is generic over integer size. For example I implemented a Halveable trait to allow splitting and recombining generic integers, which means roaring-rs can support bitmaps of anywhere from u16 to u64 (including usize for 32/64 bit machines) with no runtime cost (assumedly, liberal inlining and basic arithmetic optimizations should remove all the extra code) and no additional code (other than the basic trait impl, and the horribleness the genericity requires in some constraints).

Similarly, despite {trailing,leading}_zeros being implemented in std, I can’t use them as they’re only inherent impls. I have to include the num crate to have a trait that allows access to the functions.

BTW, num is a good contender for somewhere that might accept a feature like this if std doesn’t.

Used by some collections.

Maps to a machine instruction / LLVM intrinsics => not feasible to implement in a stable crate.


I feel like a better API (though it does not consider BE vs LE) would be one which returns a [T; 2] (where T has halved bitwidth), or something that would be similar and would be no-op to convert into (i.e. has the same representation as the value being split up itself).

1 Like