U32::to_be() considered harmful, or how to encourage safe endian handling

Something like this: ...

Oh, I see. Basically replacing explicit gets/sets on individual members with a serialize/deserialize for the whole structure (in which case you don't really need the #repr(C) any more, the serialize functions can handle the packing as well). Since my last post, I spotted packed_struct which does more or less that.

It's a viable approach but has some drawbacks:

  • You still have the explicit conversion, which is a little clunky
  • It will always unpack the entire structure, even if you just want one field
  • You're moving the thing around as the unpacked structure, which makes the cache-footprint aware low-level programmer in me twitchy
  • For some shared memory protocols (particularly, e.g. with DMA hardware), you can't always safely read or write the entire record

[A minor point, I also thing having both _be and _le variants on the same type is a bad idea - it's rare that the same structure needs to be accessed in both le and be variants. It does happen (despite being terrible, no good, very bad interface design) but it's rare enough that having to work harder for it is ok]

  1. C doesn't have this attribute. #[repr(packed)], #[repr(align())] are.

No, it doesn't, and a multitude of endianness bugs exist as a result.

I may misunderstand something. How looks like your proposal?

Ah, right. So what I had in mind is that the base semantics are defined in terms of:

struct S {
    #repr(le)
    x: u32,
    #repr(be)
    y: i64,
}

So that you can define a mixed-endian structure. It's not common but it does happen (e.g. using a combined header structure for several layers of network protocol with different endianness). I've also seen some weird hardware devices with mixed endian registers.

I'd then envisage being able to put the attribute on a structure as a shorthand for putting the same tag on all the integer fields (and recursively on any substructures). Ideally you could still override individual fields or substructures in that case.

I'm not sure if there are any cases where it could make sense on a bare variable rather than a structure field.

I find using attributes on each field is similar to bit-fields. Rust doesn't support bit-fields too and there were many proposals. Probably some reasons given to that is also applicable to this proposal.

Heh, that is a point. bitfields in C are a pain (I almost always avoid them) precisely because they don't pin down the in memory representation well enough.

Actually, thinking about it, I guess #repr on individual fields is kinda nasty, because it doesn't have an obvious place to which it gets attached. #repr on the struct gets attached to the type definition, but fields within it would also need to be attached to the surrounding type, rather than the thing they're actually next to.

That said, only allowing it on structs would still handle the vast majority of cases (same endianness for all fields).

You could just do:

struct S {
    x: LE<u32>,
    y: BE<i64>,
}

given a crate implementing LE and BE.

Language support seems unnecessary, or at least it could be in the more generic form of user-defined automatic coercions if you want to save the x.into() call that this code needs to convert.

4 Likes

My understanding of the network vs host byte ordering problem has always been that you want to immediately convert the network bytes to host order integers, and do all your work in host order integers. In particular, I was always under the impression that there simply is no use case for a "big endian u32" type or a "little endian u32" type, as opposed to (de)serialization functions that convert between a network-order [u8; 4] and a host-order u32 or whatever. AFAIK all awareness of big/little endian-ness is hidden in the platform-specific implementation details of converting between network and host order, and wouldn't benefit from any dedicated types.

Put yet another way, I'm suggesting that most of this paragraph is the correct way to do things in principle, not just in C, and questioning the implication at the end that there is a better way.


If there are genuine use cases for big/little endian integer types, then we can start talking about whether they should be BigEndian<u32> or u32be or #[repr(BigEndian)] or whatever, and whether warnings or other changes would be appropriate. But I don't think it makes much sense to try and start that conversation until we've gotten much clearer about the intended use cases. Details like how attributes interact with generics simply aren't relevant yet when we don't even know if there's any motivation for new core lang types or new layout categories.

So, to refocus on what I think is relevant, I'll ask some stupid questions about things I have zero experience with:

This seems like the closest thing in the thread so far to an attempt to describe use cases for big/little endian types in the core language rather than (de)serialization functions, but are there any systems where syscalls or C libraries use be/le instead of host order? I was under the impression there tautologically were not, since that's what "host order" means.

These sound like potentially compelling use cases for le/be types, but these also sound like use cases I'd expect to be quarantined to the fringes of any codebase, and only exposed to higher-level code as host-order integers or maybe as byte arrays, in which case handcrafted library types ought to be fine (and handcrafting might be mandatory anyway, if the hardware's weird enough). Is that not the case?

Since the OP mentioned working on QEMU, one example which immediately comes to my mind is sharing of memory between emulated hardware and its host. But that is admittedly a niche use case.

3 Likes

This use case can be generalised to what I'll call 'sparsely-accessed data structures'.

Imagine there's a data structure with a fixed layout, in which you sporadically need to read and/or modify a couple of fields. The serialisation approach would have you define 'native' Rust data structures and go back-and-forth between those and byte buffers when accessing such 'foreign' data. The serialisation process necessarily converts the whole structure, including the data you don't actually need at the moment, and requires you to separately allocate 'native' structures from which you can extract data to manipulate. By treating endianness as a memory-representation problem, you are able to only pay the cost of converting the data you actually need while leaving the rest alone. Plus, if the 'foreign' memory representation happens to agree with the native ABI, you can even avoid adding any additional cost to those accesses at all; they can just compile to ordinary memory accesses.

And indeed emulators is where this comes up quite frequently. As a (somewhat extreme) example, take DOSBox, which not only emulates an x86 CPU, but also contains an implementation of the DOS kernel in the host; as such, it needs to be able to access DOS-specific data structures kept in guest memory (PSP, CDS, MCBs, FCBs, device driver headers, ioctl buffers, the list of lists) to implement DOS system calls and other ABI. If one were to Rewrite It In Rust™, fixed-endianness types would be a huge help here.

That said, I'm not sure if this necessarily needs to be a core language feature, or even a standard library feature. A well-designed crate could be just as expedient.

5 Likes

My understanding of the network vs host byte ordering problem has always been that you want to immediately convert the network bytes to host order integers, and do all your work in host order integers. In particular, I was always under the impression that there simply is no use case for a "big endian u32" type or a "little endian u32" type , as opposed to (de)serialization functions that convert between a network-order [u8; 4] and a host-order u32 or whatever. AFAIK all awareness of big/little endian-ness is hidden in the platform-specific implementation details of converting between network and host order, and wouldn't benefit from any dedicated types.

Right, if you're pulling things in from a stream (e.g. network) that's the way to go (although even there reading into fixed representation structures is a common (anti)pattern because it's so easy to do compared to having a bunch of deserialization boilerplate - maybe not so much in Rust as it is in C).

But the case I'm most familiar with is the shared memory structure, because that's what I've encountered (and debugged) a lot in the past.

Put yet another way, I'm suggesting that most of this paragraph is the correct way to do things in principle, not just in C, and questioning the implication at the end that there is a better way.

So, I'm not trying to say there's a better way than what I described, what I'm trying to say is that Rust is in a much better position to encourage or enforce the good way of doing things.

This seems like the closest thing in the thread so far to an attempt to describe use cases for big/little endian types in the core language rather than (de)serialization functions, but are there any systems where syscalls or C libraries use be/le instead of host order? I was under the impression there tautologically were not, since that's what "host order" means.

Examples I can think of:

  • Emulation of one endianness system on another (as noted elswhere, I'm very familiar with this being a QEMU developer)
  • Hardware devices: most hardware devices have LE registers, so on BE systems you need swaps when accessing them. A few hardware devices have BE registers, so you need the same on LE systems. A very few devices have different registers in different endianness (usually this comes about because the device has some sort of internal bridge or layering where different components were built by different teams)
  • On POWER servers, we've now mostly changed over to LE, though it was traditionally BE. But many firmware interfaces and some hardware devices remain BE because of that history.
  • There's no inherent reason on many cpus you couldn't run BE userspace programs on an LE kernel, or vice versa, though I don't know off hand of any cases that support this now.
  • In-memory data structures defined by cross-platform specifications to have a particular byte order. e.g. the flattened device tree used on many embedded systems is always BE because history, but needs to be read by LE kernels and other software. I don't know if ACPI tables are always LE or are host endian [Note: for a cross platform spec declaring things "host endian" is nearly always a mistake]

That said, I'm not sure if this necessarily needs to be a core language feature, or even a standard library feature. A well-designed crate could be just as expedient.

It's not a question of making it possible, it's a question of making the obvious way to do things a good way. The standard library already includes u32::to_be_bytes() etc. which are fine for streaming, but awkward for the shared structure case. byteorder expands on the streaming stuff, but doesn't really do anything extra for the shared structure case. There are several crates that do the endian specific types thing, several are mentioned in posts above.

But none of them are particularly obvious to find. In the meantime, u32::to_be() looks like an obvious choice: it matches the way this is usually done in C - the dangerous, bug prone way it's usually done in C.

Well.. that and the fact that we have a whole system for controlling in memory representation of types which can control so little about the in memory representation of types.

4 Likes

Imagine there's a data structure with a fixed layout, in which you sporadically need to read and/or modify a couple of fields. The serialisation approach would have you into define 'native' Rust data structures and go back-and-forth between those and byte buffers when accessing such 'foreign' data. The serialisation process necessarily converts the whole structure, including the data you don't actually need at the moment, and requires you to separately allocate 'native' structures form which you can extract data to manipulate. By treating endianness as a memory-representation problem, you are able to only pay the cost of converting the data you actually need while leaving the rest alone. Plus, if the 'foreign' memory representation happens to agree with the native ABI, you can even avoid adding any additional cost to those accesses at all; they can just compile to ordinary memory accesses.

Right, that's a good way of putting it. I tend to dislike in general the pattern of converting an externally specified representation into an internal one, unless you have a really compelling reason to do so (like a totally different organization of the data). That's not so much for the performance impact (which is often insignificant), but simply because I find it conceptually clearer if there's a uniform representation of something across the project - so if you need an externally specified representation anywhere, you might as well use it everywhere.

2 Likes

The alternative to reparsing and serializing the entirety of an in-memory structure is to define accessors for specific fields that operate directly on the serialized memory (probably very simply via to_be_bytes and from_be_bytes on each access). This is similar to how data formats such as Cap'n Proto use the same data layout for a serialized message and in-memory modification of such a message.

1 Like

A small note on history: the u32::to_be() family of methods was stabilised alongside the 1.0 release of Rust and might not have seen more discussion than "we need a way to deal with endianness concerns and that's what C does". The u32::to_be_bytes() family is a much later addition (the docs say 1.32) which IIRC was introduced because of concerns similar to those under discussion here.

Do you know that it is possible to formally deprecate these methods? Because that is what it seems you are advocating for. And that looks like a much more achievable goal than adding new #[repr] attributes to the language, especially ones that look like newtype wrappers in disguise.

(about external crates)

Yes, the issue w.r.t. discoverability on crates.io has been known for a long time. It is a hard problem, and not specific to this topic.

If you are talking about the #[repr] family of attributes, I am not sure you quite understand their purpose (though I could be mistaken myself). So far, all of these attributes apply to compound types and change the way their overall representation is computed from the individual parts. But endianness is a property of a primitive type. That is why many are advising you to use newtype wrappers: to make your own endian-aware primitive types.

3 Likes

u32 is a type with the native endian, by definition. Adding an attribute that makes it something else doesn't seem right to me.

You don't use #[repr(signed)] u32 to make an i32, you have a separate i32 type. So if you want a little/big-endian types, make u32le/u32be types.

If there's a mistake here, it's the existence of #[repr(packed)] that changes alignment of types, effectively making all the new types that are incompatible with the originals (due to &struct.field being unsafe).

7 Likes

This part is making a lot more sense to me now. I'd definitely support deprecating to_be() et. al. in favor of to_be_bytes() et. al.

Assuming we word the deprecation notice well, that ought to be enough of a speedbump to get people interested in "the shared structure use case" to go look at crates.io.

That is a very helpful list. I don't think these examples quite produce an argument for adding new types in std (assuming we deprecate to_be() et. al.), but some of them look like great candidates for a "de facto standard" solution. I feel like there ought to be a WG to ping for this, but the Embedded WG doesn't seem quite right for any of it.

1 Like

I can't see this happening. repr(C) is mostly for FFI, where forcing the endianness to be specified doesn't make things any safer on your platform, and is just wrong if you ever want to run on a machine with the opposite endianness.

:+1:

This is another example of why Rust needs computed properties.

...I know, I know. But think about it. It makes total sense to have a separate type; &foo.field cannot be an &u32. Yet, if you write:

let mut x = foo.field;

What happens if x gets the inferred type u32le? At best, it's annoying: nothing works with u32le, so you have to explicitly convert to regular u32 somewhere. At worst, there might be some performance overhead. Suppose that u32le decides to implement AddAssign<u32> for convenience's sake, so that this compiles:

x += 1;

It would be convenient, but it would have a hidden cost: the implementation would have to first convert to native endian, then do the addition, then convert back to fixed endian.

In any case, it doesn't make sense for x to be u32le. Semantically, foo.field is a u32 value that is stored as little-endian. There's no reason to think that a copy of that value in a local variable would also want to be stored as little-endian!

So the typical alternative (as suggested by @Nemo157) is to use getters and setters. Either foo.field() and foo.set_field(val), or foo.field.get() and foo.field.set(val). They could accept and return plain u32, so the little-endian aspect doesn't leak into surrounding code. Works great, but there are two problems:

  • They're ugly.

  • They don't work with patterns; you can't say e.g. let Foo { field } = get_foo();.

That's not the end of the world. If it were just fixed-endian integers that had these problems, I'd say "just deal with it". But there's also:

  • Cell and atomics

  • Bit fields

  • Custom data structure layouts in general (of which bit fields are a special case)

One thing these all have in common, along with fixed-endian integers (if compared to bincode), is that they can be used as performance optimizations. They're not things Rust should be penalizing in terms of ergonomics. But so far they are, because computed properties are considered too scary.

4 Likes

struct.other_endian_field += 1 would be converting back and forth anyway (unless you have a bi-endian cpu and the compiler is very clever?). It would even make sense to do so if the struct is used to edit a memory-mapped file.

OTOH if you're going to copy the data out of the struct and process it elsewhere, then that sounds more like deserialization. Perhaps a better solution for that would be a bincode-like Serde library for C structs with an endian?

Ah, that's very interesting.

I did not know that, that sounds like a good idea.

Aaaahhh... I think I've got it now. So actually, I don't think the distinction between compound and simple types is particularly relevant here. But what I hadn't fully made sense of is that #repr is a property of a type, not a variable (and a structure field mostly behaves like a variable). So yes, I'm convinced that newtype wrappers are the right way to handle this.

So, you'll see above that I've accepted your conclusion that newtypes are the way to do this. But I'm going to be a bit pedantic about that statement. In most cases a u32 is an abstract concept; for example a u32 that never leaves registers has no endianness. The representation of the u32 when it is stored in memory is something defined by the per-platform ABI, with "native endian" being the convention. That's a little tautological though, since what constitutes "native endian" is really defined by the ABI as well.

Again, I accept the overall conclusion, but this is a false analogy. Signed versus unsigned is much more fundamental than representation - it affects what abstract values you can store in the type, and how operations like addition and multiplication work on it. Endianness does not.

Performance considerations w.r.t. endian are generally misplaced. Assuming you always do the conversions at load/store time (a.k.a. the right way), the byte-swizzling is almost always of negligible cost compared to the load or store itself on even remotely modern hardware and it can usually be done concurrently with another load or store. In some cases it is truly zero cost: e.g. POWER has byte-reversing load store instructions which cost no more than the regular versions.

It doesn't need a particularly clever compiler. Remember that values in registers do not have endian, only values in memory. So struct.other_endian_field += 1 is simply

  1. Other-endian load
  2. Increment in the usual way
  3. Other-endian store

For some CPUs those "other endian load/store" things become a single instruction, for the rest they're a regular load/store and some very cheap shuffling.

The packed_struct crate mentioned above is roughly this.

Ok.. cool. So, new proposal

  1. We try to get u32::to_be() and the like deprecated. What's the first step in that process?
  2. Try to add suitable endian newtype wrappers to byteorder (that being the de facto standard endian handling crate). I'll try to find time to write and post some patches for the crate.
2 Likes

I'm glad to hear it. Thought of a couple more:

  • Shared buffers for high performance RDMA style network libraries (e.g. MPI) need an agreed upon endianness to support heterogenous clusters (admittedly MPI things will usually be dealing mostly with FP, not integers).

And for a moderately common case of mixed endian in the same buffer:

  • It's pretty common for DMA buffers for NICs to have a prefix header with parameters for the NIC itself, followed by the "on the wire" data with ethernet and IP headers. The NIC header is usually little endian (because most current hardware works in LE, because x86 hegemony) but the network headers are mostly big endian (because ARPAnet history). There are cases (again mostly for high performance / low-latency cluster-internal type communication) where it's useful and/or convenient to have one big structure covering the whole lot.
2 Likes

The idea is that you're just copying the value out and then trying to do some arithmetic on it. There is no reason that should involve any endian conversions past the point of copying, but if your local variable gets inferred to a fixed endian-type then you can get it anyway.

Not as efficient.

By the way, there are also file formats, such as most executable formats, that have both big-endian and little-endian variants. If you want to support both in the same program, you won't know the endian until runtime. So it may also be useful to consider adding "unknown endian" newtype wrappers that can convert to native from a runtime endian value.

You could implement this using fixed-endian types plus generics – something like:

if is_be {      
    foo::<u32be>();
} else {
    foo::<u32le>();
}

However, I prefer to avoid this, as the duplicated monomorphized code results in binary bloat.