`BitSlice` or a sound way to implement one

The basic idea is this: if &[u8] is a fat pointer to a byte slice which consists of two parts: a *const u8 pointer and what's effectively a usize containing a length in bytes, could you have an equivalent fat pointer type which stores the length in bits instead, where the number of bits may be up to 7 fewer than the equivalently sized byte slice, and also support this as a reference-to-reference conversion such that the type itself is used as &BitSlice as opposed to BitSlice<'a> such that it works with Borrow/ToOwned/Cow patterns?

The goal would be to get the same &[u8] back out but also get those 3 additional length bits which give you the sub-byte bit-level length granularity, all in a reference type that fits in the same space as any other fat pointer, where we can recompute the original byte-level length of the byte slice as div_ceil(8).

This matters to me in particular because ASN.1 BIT STRING just so happens to be one of the most core fundamental data structures for cryptography formats, as it's the type used to represent public keys in X.509 SubjectPublicKeyInfo (a.k.a. SPKI), which in addition to its use in X.509 is used in other cryptographic formats like PKCS#8 for private key storage (see OneAsymmetricKey). It would be really nice to be able to implement a zero copy type for representing BIT STRING which itself is a fat pointer that can be Cow-d.

Such a type exists today in third party libraries: bitvec::slice::BitSlice. I have also attempted to implement the same pattern: a trick which involves constructing [Zst] using slice::from_raw_parts which carries the original pointer and lifetime for the &[u8] but carries the bit length rather than the byte length as the length component.

It seems this approach has been somewhat problematic from a soundness perspective though, with code being accidentally deleted in release builds as of LLVM 16 and a suggested workaround being unsound under stacked borrows (but accepted by -Zmiri-tree-borrows).

With code that's considered unsound under any model of the language making a particular approach seem scary for use in cryptographic applications (and particularly ones dealing with serialization formats for untrusted data at the network's edge), I had an idea that would make me a lot more confident: what if the BitSlice type were simply built into the language? Then its implementation just needs to be as sound as whatever the current model of soundness du jour happens to be, can leverage internal compiler magic, etc.

Alternatively, if there were unsafe APIs that could take apart a fat pointer and put it back together while preserving provenance, but allow the caller to carry along some additional data in the length field, kind of like the addr/with_addr/map_addr APIs for tagged pointers work, that could also do the trick.

3 Likes

Why is bitvec implemented with a reference instead of a raw pointer like *const u8 or *const () (alongside a separate length field, and PhantomData for the lifetime)? I'll try to keep looking through the bitvec repo to figure out the motivation, but a quick look through its Issues didn't give me any information.

Edit: even *const [u8] ( + PhantomData for the lifetime) would work.

I feel like "raw pointers instead of references" is precisely this API, unless there's some critical optimization that isn't applied with that solution.

Edit again: ahhhhh, the problem is Borrow and friends.

1 Like

I feel like this is a “we need ergonomic support for more lifetime HKTs beyond &'_ T” problem more than a “we need to add hacks that let &'_ T support more kinds of lifetime HKTs” problem.

2 Likes

For anyone not already aware of roughly what the problem is: Rust basically has a few stable ways of supporting “higher-kinded types” (HKTs), which are families of types, like Vec (without a <T>).

Here are the three stable mechanisms for HKTs with one lifetime parameter:

  • You can denote the HKT family &'varying T with by just T, and feed the HKT a 'varying parameter to get &'varying T.
  • You can use a generic associated type (GAT) like trait GatHKT { type Apply<'varying>; }, perhaps to make a GatDeref equivalent of std’s Deref.
  • You can use higher-kinded trait bounds (like for<'varying> Trait<'varying>) to do something insane like my variance-family crate — which is thousands of lines of code long — to try to get much more thorough support for lifetime-parameterized HKTs.

Each of these three options has problems. The first isn’t fully general. The second interacts poorly with higher-kinded trait bounds. The third involves several unsafe traits and presumably isn’t fit for std.

std uses the first option, which means its traits aren’t as general as possible.

Yeah, we do also have our own separate GAT-friendly traits for doing these bidirectional owned/reference conversions: RefToOwned and OwnedToRef which work better in these cases where we want to carry along some structured data in addition to a reference, which isn't otherwise possible with Borrow and ToOwned.

The particular BitSlice use case though happens to fall more into the territory of a custom DST, or one which is doing things which push the edge of what Rust is currently capable of expressing.

I guess you might also want another 3 bits to indicate the starting bit position?

Could be useful for others for full slice-like functionality (although not needed for my case)

The problem is, it isn't done, so we have nothing in the end :sweat_smile:

You could probably use an extern type for that (except they're unstable).

3 Likes

I would really like to just have a u1 type where Vec<u1> and [u1] behave as you'd expect. For this to be possible without breaking everything you'd need at least:

  • A new BitSized built-in trait where Sized: BitSized.
  • For u1 to not implement Sized (because it doesn't have a size expressible in bytes).
  • To relax the bounds on Vec<T> and [T] to just be T: BitSized.

I don't know how feasible this is on the codegen side of things though, eg. if you have two &mut [u1] that overlap the same byte.

I think we can permit forming two &mut u1 references to neighboring bits of the same byte in single-threaded scenarios, but I could be missing something. However, if you have something like ([Mutex<()>; 2], UnsafeCell<[u1; 2]>), is it sound to concurrently mutate each u1 value on different threads? I believe the answer to that is no, which makes things… complicated. However, I don’t think it can break existing generic code.

Here’s my attempt:

struct DoubleMutex<T: ?Sized> {
    one: Mutex<()>,
    two: Mutex<()>,
    // not possible.

    // And `T: Sized` wouldn’t break from `u1`.
    // packed: UnsafeCell<[T; 2]>,

    // uses `size_of_val` to pack two `T`s on the heap.
    // However, that function has byte granularity;
    // if `u1: MetaSized`, we’d presumably end up
    // adding 7 bits of padding between the `u1`s here.
    packed2: NonNull<()>,

    // other fields used to get `packed2` to work
    …,
}

Edit: wait, doesn’t this imply that &mut u1: Send cannot be permitted to hold, so u1: !Send?

That’s a massive pain for a POD type.

The painful thing here is that the way auto traits work is that this also would prevent the whole struct or array from being Send by way of the auto trait mechanism if it contains a u1. And it would be rather painful to have to do manual unsafe impls all over the code.

This is definitely only possible with language support. At the assembly level you can overwrite individual bits at a memory address with AND/ORD. But at the library level you're faced with the problem that you can't overwrite neighbouring bits you don't own, and you can't read+update them because they may be uninitialized or be being written concurrently.

It sucks that seemingly no low-level language lets you manipulate data at bit-level granularity the same way you can at byte-level. (I did look around at some others recently because doing this was so frustrating in rust). Is there actually a hard reason for it? I can only think of bad reasons like:

  • "Pointers are byte level" - You could use the top three bits for a bit-offset, or use fat pointers for sub-byte-aligned data.
  • "Reading/writing individual bits requires more instructions" - That'd usually be a small price to pay for better cache utilization, and it's something people would opt-in to by using sub-byte sized types.

What's the good case against it, assuming you were making a rust-like language from scratch and weren't worried about additional complexity or backwards-compatibility?

Yeah, it's an interesting idea for a greenfield language (and maybe greenfield CPU). Treating bits as non-independent parts of a byte doesn't really feel different to me from treating bytes as non-independent parts of a word, and that was definitely a thing for a long time. And 64-bit address spaces are BIG, dividing the maximum size by 8 isn't really a problem. I guess that might be a downside for smaller address spaces, that you either need two kinds of pointers or a relatively low max memory limit to fit that extra addressing in.

C++’s vector<bool> famously tried to do this and it didn’t go well, but maybe Rust could fare better there.

2 Likes

Part of the issue with the way C++ did it was that it is a specialisation for a normal vector rather than a dedicated bitvector type. This means it behaves differently than all other vectors, and for new programmers it is a bit of a gotcha. I feel like most of the issues could have been avoided if it was a separate bitvector.

2 Likes

The problem is that even at the assembly level, there's a risk of overwriting neighbouring bits you don't own (unless you use atomic read-modify-write instructions, which are frequently much slower than their nonatomic equivalents) – this doesn't matter in a single-threaded context (because you are overwriting them with the value they already have) but might in a multithreaded context. I think most languages don't expose an easy way to do this because most processors don't expose an easy way to do it, and languages are normally designed around processor functionality.

Even in the single-threaded context, the time paid due to the extra instructions is likely going to be higher than the time saved due to the lower memory usage unless you have a lot of values that aren't a whole number of bytes. In practice, this would only normally come up when you have an array of them (e.g. with a map, the savings would be negligible compared to the size of the keys). As such, it probably makes more sense to have bit-array / bit-slice / bit-vector types specifically rather than bit-references in general.

3 Likes