Could you borrow a bit?

Could there be a &mut self method on e.g. core u* integers that lets you borrow a bit as &mut bool, or does the layout of &bool prevent that, and if so could there be some other way to achieve a similar effect? (possibly introducing a new type like bit and returning &mut bit)

If you're curious about my use case, I'm trying to implement the IndexMut trait on a bitstring reference type.

Borrowing a bit would be like partially borrowing a struct. You need a reference that's a pointer to a whole but only allowed access to a part.

More complications arise if you wanted simultaneous borrows of different bits in the same byte - basically, mutable access to those can't co-exist thread-safely.

The layout of &bool prevents having this kind of functionality.

Implementation-wise I can imagine either an approach like &bit<N> for borrowing the Nth bit in a byte for N known at compile time, or we need to extend the representation of what a reference to a bit contains, maybe a far pointer where the sub-byte position is given in the metadata (like length for slices).

Similarly, there are other parallels to slice references that a reference to a single bit would need to come with: you shouldn't be allowed to perform operations like mem::replace on it! It could be through being !Sized, or some comparable mechanism. That's because operations like mem::replace are generally implemented by considering only the size of the type - in bytes - and then copying that amount of bytes around. You don't want that, because it would result in all the neighboring bits in the same byte being replaced along with the one you were meaning to reference.

I wonder if the feature discussions in https://rust-lang.zulipchat.com/#topics/channel/522311-t-lang.2Fcustom-refs have any power of supporting references to less-than-byte (or otherwise intermixed in the encoding) amounts of data.[1]


  1. I have yet to catch up with all the details / ideas being discussed there more in-depth. ↩︎

1 Like

Without doing due diligence to search for past proposals, loosening Index[Mut] to return any type that implements Deref[Mut] rather than a hardcoded reference would cover this use case…but I suspect that's massively breaking in practice. Although maybe if you made it a new trait GeneralizedIndex, since people mostly only care about the subscript sugar…

2 Likes

I don't think so, you'll run in the same issue with DerefMut having to return a reference.

But you could return a type with a locally-stored bool that's assigned back on Drop, and you'd end up with nearly the same effect.

1 Like

The type family &mut T with T: Sized definitionally relies on the meaning of size_of::<T> for the underlying aliased memory region. So that's pretty much out or at the very least requires you to internally point to the type as UnsafeCell<u8> and so unfortunately the type will lose all noalias benefits.

What's disappointing here is that it does not compose well with other independent references, i.e. since shared references are used to represent a unique one none of the builtin projection interfaces would work well here.

Also, if we consider representing full bitfields with an additional width property instead of or mixed in with single bits note that for the dynamic variant all pointer metadata must implement Ord; but how do you properly sort usizeĂ—usize? Probably tuple sorting where the start is compared first and the length only on mismatch (it's weird though, as an analogue is there an ordering relationship between all pointers to slices? we even have lint for this ambiguous_wide_pointer_comparisons).

A particularly fun question for bitstring reference type libraries like what I'm writing is: is how do you implement split_at_mut?

So far my answer is: you don't. But if the language supported mutable borrow splitting at a sub-byte granularity, then it could be possible.

I’m now noticing that the ask in this thread has large overlap with parts of the discussion in your recent previous thread: `BitSlice` or a sound way to implement one

Regarding

specifically the last replies starting with

As far as I understand the reality of possible implementations, you’d have to be either restricting split mutable borrows to stay within the same thread (i.e. some degree of introducing !Sync or !Send with these types) or you’d need to implement modifications of individual bits in a byte with atomic operations.

This you can work out yourself.

There are usize::MAX possible places a bool could be, because align_of::<bool>() == 1.

&bool is non-null, so it has exactly usize::MAX possible value.

Thus all possible values are accounted for and there's no space to add bit information.

(Plus we wouldn't do that anyway, for other reasons.)

It would be nice if the language could do that for us so we can have a &mut u1 type that works like any other (possibly fat) reference. Ideally the compiler would also optimize away the atomic-ness whereever possible of course. This doesn't seem too different from the fact that working with individual bytes can be slower for the hardware than working with larger ints, but we don't force people away from using bytes.

And I'm not sure that &u1 references would have to be fat either, since no platform (I think?) uses the full address space. Of course CPU vendors are free to start using the top three bits whenever they please, but if a popular programming language started using those bits for sub-byte addressing perhaps the vendors could be persuaded to leave them reserved.

32-bit platforms regularly use the entire address space (which makes sense if you think about it – it's far from rare for a program to want 4 GiB of memory).

As for 64-bit platforms, they've been running low on spare bits recently: x86-64 and ARM64 can both (nowadays) be configured to leave some bits reserved for software and not use them for pointers or other hardware-specific metadata, but in this mode x86-64 has only 6 spare bits when using 5-level paging, and ARM64 has only 4 bits spare when using MTE. So there's theoretically enough room on the two most popular 64-bit platforms at the moment, but it's already pretty close to the limit and it's plausible that more bits will be needed for hardware features in the future (e.g. ARM64 used to have 8 before MTE was invented, and still does on non-MTE platforms).

2 Likes

IIUC on RISC-V you also can not freely use the higher address bits since the spec mandates that 64-bit addresses "must have bits 63-48 all equal to bit 47, or else a page-fault exception will occur." You probably could mask the bits on each access, but it's likely to be inefficient and not compatible with hypothetical CHERI-like hardware.

I think it should be possible by not restricting ourselves to relying on references.

As I wrote in the other discussion, a quasi-reference type like this could work:

struct BitSliceMut<'a> {
    origin_ptr: *mut u8,
    bit_offset: usize,
    bit_len: usize,
    _pd: PhantomData<&'a mut u8>,
}

By keeping it non-Send it should be possible to construct a safe API for creating (e.g. with split_at_mut) and manipulating such type.

Yes, there is a number of unfortunate ergonomic issues with such custom reference types, but IMO it's the most practical option right now.

This looks…confused to me. Either use u8 (or, better, bit patterns in a single u8 for bit_offset and bit_len or the PhantomData needs to keep a reference to a [u8] slice. In any case, this only works for slices into bitfields up to USIZE_MAX / 4 bytes in length and starting at a bit in the first USIZE_MAX bits.

It's just a rough draft. Sure, you can keep the byte length as usize and pack head and tail bit offsets into u8 or use some other layout, it does not change the idea.

You are unlikely to encounter in practice allocations larger than usize::MAX / 4 bits, especially considering that allocations in Rust can not be bigger than isize::MAX bytes, so we can just panic in such pathological scenarios.

AFAIK there is absolutely no difference between PhantomData<&'a mut u8> and PhantomData<&'a mut [u8]>. PhantomData is used to track the lifetime and variance, so the concrete type does not matter here.