My understanding of the network vs host byte ordering problem has always been that you want to immediately convert the network bytes to host order integers, and do all your work in host order integers. In particular, I was always under the impression that there simply is no use case for a "big endian u32" type or a "little endian u32" type , as opposed to (de)serialization functions that convert between a network-order
[u8; 4]
and a host-orderu32
or whatever. AFAIK all awareness of big/little endian-ness is hidden in the platform-specific implementation details of converting between network and host order, and wouldn't benefit from any dedicated types.
Right, if you're pulling things in from a stream (e.g. network) that's the way to go (although even there reading into fixed representation structures is a common (anti)pattern because it's so easy to do compared to having a bunch of deserialization boilerplate - maybe not so much in Rust as it is in C).
But the case I'm most familiar with is the shared memory structure, because that's what I've encountered (and debugged) a lot in the past.
Put yet another way, I'm suggesting that most of this paragraph is the correct way to do things in principle, not just in C, and questioning the implication at the end that there is a better way.
So, I'm not trying to say there's a better way than what I described, what I'm trying to say is that Rust is in a much better position to encourage or enforce the good way of doing things.
This seems like the closest thing in the thread so far to an attempt to describe use cases for big/little endian types in the core language rather than (de)serialization functions, but are there any systems where syscalls or C libraries use be/le instead of host order? I was under the impression there tautologically were not, since that's what "host order" means.
Examples I can think of:
- Emulation of one endianness system on another (as noted elswhere, I'm very familiar with this being a QEMU developer)
- Hardware devices: most hardware devices have LE registers, so on BE systems you need swaps when accessing them. A few hardware devices have BE registers, so you need the same on LE systems. A very few devices have different registers in different endianness (usually this comes about because the device has some sort of internal bridge or layering where different components were built by different teams)
- On POWER servers, we've now mostly changed over to LE, though it was traditionally BE. But many firmware interfaces and some hardware devices remain BE because of that history.
- There's no inherent reason on many cpus you couldn't run BE userspace programs on an LE kernel, or vice versa, though I don't know off hand of any cases that support this now.
- In-memory data structures defined by cross-platform specifications to have a particular byte order. e.g. the flattened device tree used on many embedded systems is always BE because history, but needs to be read by LE kernels and other software. I don't know if ACPI tables are always LE or are host endian [Note: for a cross platform spec declaring things "host endian" is nearly always a mistake]
That said, I'm not sure if this necessarily needs to be a core language feature, or even a standard library feature. A well-designed crate could be just as expedient.
It's not a question of making it possible, it's a question of making the obvious way to do things a good way. The standard library already includes u32::to_be_bytes() etc. which are fine for streaming, but awkward for the shared structure case. byteorder expands on the streaming stuff, but doesn't really do anything extra for the shared structure case. There are several crates that do the endian specific types thing, several are mentioned in posts above.
But none of them are particularly obvious to find. In the meantime, u32::to_be() looks like an obvious choice: it matches the way this is usually done in C - the dangerous, bug prone way it's usually done in C.
Well.. that and the fact that we have a whole system for controlling in memory representation of types which can control so little about the in memory representation of types.