However that case (and the one where it's ABI-equivalent to (u8, NonZeroU8)
) are not that interesting because they have an obvious niche.
In that case you would just add a bytes: [u8; SIZE]
variant to the union so that you actually have a variant you can write those zeroes through, and completely suppress any happens-to-be-aligned niches between the other variants.
Huge from me. No idea what form this should take, given that I've tried to propose it before and overlooked an obvious issue.
That said, using padding in struct
s/enum
s is something that can be done today, presumably without a ton of effort. While it may seem like this would negative impact performance, this isn't the case from benchmarks I have run (a few compiler releases back admittedly). On the other side of things is ranged integers that inherently have significant niche spaces. That's something I plan on writing an RFC for eventually (anyone can message me if they want to; it'll be a bit before I get to it). Both of these can be done without requiring the user to explicitly declare niches. The former would provide many niche values for what I presume is most structs.
I'm wondering if this should only be guaranteed to work on repr(c)
structs ?
I just want to come back and note that #[repr(no_niche)]
exists and is used for UnsafeCell
. If we decide to exploit union
niches, MaybeUninit
probably needs to be marked #[repr(no_niche)]
. (It currently isn't.)
I think making #[repr(C)] union
bag-of-bits where any bit pattern is valid, and exploiting never-valid niche values for #[repr(Rust)] union
(where internal padding is of course undefined and not a valid niche spot) seems reasonable, though I am of course wary of any change making less unsafe
code valid.
If we do make the change, it should be over an edition with a rustfix migration to mark all existing union
as #[repr(C)]
or to add a __phantom: ()
variant to try to keep code from silently getting miscompiled. (But I'm also afraid that the code would update the edition without running rustfix
; there's the secondary promise that warning-clean code upgrades cleanly, which we weakened to "if edition migration lints are on," but I fear[1]. The worst possible previous edition upgrade would've been disjoint closure captures changing when e.g. a lock was dropped, which can be safety critical, but is a bit less directly tied to user code soundness concerns.)
-
I fear the worst, but I think this is still reasonable with a prominent notice in the edition release notes that "hey, you do really need to run
rustfix
before updating your edition." âŠī¸
I don't think so because it does have a unit variant which means that its entire contents are allowed to be uninitialized (padding), and there is no niche.
The potential problem there is that ()
is still zero sized. So the padding involved isn't internal padding. Or IOW, &mut ()
doesn't write to the rest of the union
, so ()
doesn't care what the rest of the union
is.
Yes, initialization of the union
with the ()
variant doesn't address the "external" padding being valid. Obviously, if such niches were to be exploited, then that'd have to be changed. But also, it doesn't cause problems, either, for anything that isn't a MaybeUninit
-alike.
Ultimately, I think both can be true:
- External padding is considered as undefined for a variant and thus not valid niche space
-
MaybeUninit
should be marked#[repr(no_niche)]
for clarity
I'm not necessarily saying that
union U {
a: u8,
b: (u8, NonZeroU8),
}
should have a niche, but that it's a valid choice to make.
Here's where the validity is reset for union
s, BTW
@eddyb 's comment here was where I last saw that union deliberately has opaque-bag-of-bits semantics
It's not internal to the ()
, but it is internal to the MaybeUninit<T>
! Moving a value of any type can happen via memcpy of all of its bits, which includes all the uninitialized ones, so to be able to have a niche, it would have to be initialized during construction of the union
. Initializing a struct can happen by starting with uninitialized memory behind raw pointers, then writing each of the fields (that needs initialization). If the same kind of logic applies to unions (i.e. you can start out with uninitialized memory and just need to initialize one of the variant-fields completely to get it into a valid initial state), then a MaybeUninit
-like union
(without any additional no_niche
or similar) can already be consisting of uninitialized memory entirely, no place for any niches.
Perhaps under more restrictive conditions (i.e. a union
must always be constructed using its constructor) there could be some form of initialization of niches hidden in the padding happening.
As indicated above, it could cause problems for unsafe code that skips the union
's constructor and assumes just initializing one union-field is enough. Also I feel like if constructing a union
automatically initializes some padding data to a value in order to rule out misinitializing some potentially used niches, that's runtime cost one might not expect.
On top of what was said above, trying to require unions to always be in one of their variants is very hard to actually define precisely. Rust validity invariants and other UB rules are already tricky enough, I think we shouldn't make things even harder -- in particular for union
which will basically always have some unsafe code around.
IMO the entire concept of the "active variant of a union
" is fundamentally flawed. Instead, just think of union
as convenient sugar for transmuting the underlying raw bytes to/from the types you are using for the access. Sadly that makes the name union
a rather bad choice, but this is not the only bad terminology we inherited from C (I am looking at you, "undefined behavior").
Big
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.