U32::to_be() considered harmful, or how to encourage safe endian handling

dwg · October 13, 2019, 9:14am

My understanding of the network vs host byte ordering problem has always been that you want to immediately convert the network bytes to host order integers, and do all your work in host order integers. In particular, I was always under the impression that there simply is no use case for a "big endian u32" type or a "little endian u32" type , as opposed to (de)serialization functions that convert between a network-order [u8; 4] and a host-order u32 or whatever. AFAIK all awareness of big/little endian-ness is hidden in the platform-specific implementation details of converting between network and host order, and wouldn't benefit from any dedicated types.

Right, if you're pulling things in from a stream (e.g. network) that's the way to go (although even there reading into fixed representation structures is a common (anti)pattern because it's so easy to do compared to having a bunch of deserialization boilerplate - maybe not so much in Rust as it is in C).

But the case I'm most familiar with is the shared memory structure, because that's what I've encountered (and debugged) a lot in the past.

Put yet another way, I'm suggesting that most of this paragraph is the correct way to do things in principle, not just in C, and questioning the implication at the end that there is a better way.

So, I'm not trying to say there's a better way than what I described, what I'm trying to say is that Rust is in a much better position to encourage or enforce the good way of doing things.

This seems like the closest thing in the thread so far to an attempt to describe use cases for big/little endian types in the core language rather than (de)serialization functions, but are there any systems where syscalls or C libraries use be/le instead of host order? I was under the impression there tautologically were not, since that's what "host order" means.

Examples I can think of:

Emulation of one endianness system on another (as noted elswhere, I'm very familiar with this being a QEMU developer)
Hardware devices: most hardware devices have LE registers, so on BE systems you need swaps when accessing them. A few hardware devices have BE registers, so you need the same on LE systems. A very few devices have different registers in different endianness (usually this comes about because the device has some sort of internal bridge or layering where different components were built by different teams)
On POWER servers, we've now mostly changed over to LE, though it was traditionally BE. But many firmware interfaces and some hardware devices remain BE because of that history.
There's no inherent reason on many cpus you couldn't run BE userspace programs on an LE kernel, or vice versa, though I don't know off hand of any cases that support this now.
In-memory data structures defined by cross-platform specifications to have a particular byte order. e.g. the flattened device tree used on many embedded systems is always BE because history, but needs to be read by LE kernels and other software. I don't know if ACPI tables are always LE or are host endian [Note: for a cross platform spec declaring things "host endian" is nearly always a mistake]

That said, I'm not sure if this necessarily needs to be a core language feature, or even a standard library feature. A well-designed crate could be just as expedient.

It's not a question of making it possible, it's a question of making the obvious way to do things a good way. The standard library already includes u32::to_be_bytes() etc. which are fine for streaming, but awkward for the shared structure case. byteorder expands on the streaming stuff, but doesn't really do anything extra for the shared structure case. There are several crates that do the endian specific types thing, several are mentioned in posts above.

But none of them are particularly obvious to find. In the meantime, u32::to_be() looks like an obvious choice: it matches the way this is usually done in C - the dangerous, bug prone way it's usually done in C.

Well.. that and the fact that we have a whole system for controlling in memory representation of types which can control so little about the in memory representation of types.

dwg · October 13, 2019, 10:02am

Imagine there's a data structure with a fixed layout, in which you sporadically need to read and/or modify a couple of fields. The serialisation approach would have you into define 'native' Rust data structures and go back-and-forth between those and byte buffers when accessing such 'foreign' data. The serialisation process necessarily converts the whole structure, including the data you don't actually need at the moment, and requires you to separately allocate 'native' structures form which you can extract data to manipulate. By treating endianness as a memory-representation problem, you are able to only pay the cost of converting the data you actually need while leaving the rest alone. Plus, if the 'foreign' memory representation happens to agree with the native ABI, you can even avoid adding any additional cost to those accesses at all; they can just compile to ordinary memory accesses.

Right, that's a good way of putting it. I tend to dislike in general the pattern of converting an externally specified representation into an internal one, unless you have a really compelling reason to do so (like a totally different organization of the data). That's not so much for the performance impact (which is often insignificant), but simply because I find it conceptually clearer if there's a uniform representation of something across the project - so if you need an externally specified representation anywhere, you might as well use it everywhere.

Nemo157 · October 13, 2019, 10:37am

The alternative to reparsing and serializing the entirety of an in-memory structure is to define accessors for specific fields that operate directly on the serialized memory (probably very simply via to_be_bytes and from_be_bytes on each access). This is similar to how data formats such as Cap'n Proto use the same data layout for a serialized message and in-memory modification of such a message.

gThorondorsen · October 13, 2019, 11:55am

A small note on history: the u32::to_be() family of methods was stabilised alongside the 1.0 release of Rust and might not have seen more discussion than "we need a way to deal with endianness concerns and that's what C does". The u32::to_be_bytes() family is a much later addition (the docs say 1.32) which IIRC was introduced because of concerns similar to those under discussion here.

Do you know that it is possible to formally deprecate these methods? Because that is what it seems you are advocating for. And that looks like a much more achievable goal than adding new #[repr] attributes to the language, especially ones that look like newtype wrappers in disguise.

(about external crates)

Yes, the issue w.r.t. discoverability on crates.io has been known for a long time. It is a hard problem, and not specific to this topic.

If you are talking about the #[repr] family of attributes, I am not sure you quite understand their purpose (though I could be mistaken myself). So far, all of these attributes apply to compound types and change the way their overall representation is computed from the individual parts. But endianness is a property of a primitive type. That is why many are advising you to use newtype wrappers: to make your own endian-aware primitive types.

kornel · October 13, 2019, 2:28pm

u32 is a type with the native endian, by definition. Adding an attribute that makes it something else doesn't seem right to me.

You don't use #[repr(signed)] u32 to make an i32, you have a separate i32 type. So if you want a little/big-endian types, make u32le/u32be types.

If there's a mistake here, it's the existence of #[repr(packed)] that changes alignment of types, effectively making all the new types that are incompatible with the originals (due to &struct.field being unsafe).

Ixrec · October 13, 2019, 4:36pm

This part is making a lot more sense to me now. I'd definitely support deprecating to_be() et. al. in favor of to_be_bytes() et. al.

Assuming we word the deprecation notice well, that ought to be enough of a speedbump to get people interested in "the shared structure use case" to go look at crates.io.

dwg:

Examples I can think of:

Emulation of one endianness system on another (as noted elswhere, I'm very familiar with this being a QEMU developer)

Hardware devices: most hardware devices have LE registers, so on BE systems you need swaps when accessing them. A few hardware devices have BE registers, so you need the same on LE systems. A very few devices have different registers in different endianness (usually this comes about because the device has some sort of internal bridge or layering where different components were built by different teams)

On POWER servers, we've now mostly changed over to LE, though it was traditionally BE. But many firmware interfaces and some hardware devices remain BE because of that history.

There's no inherent reason on many cpus you couldn't run BE userspace programs on an LE kernel, or vice versa, though I don't know off hand of any cases that support this now.

In-memory data structures defined by cross-platform specifications to have a particular byte order. e.g. the flattened device tree used on many embedded systems is always BE because history, but needs to be read by LE kernels and other software. I don't know if ACPI tables are always LE or are host endian [Note: for a cross platform spec declaring things "host endian" is nearly always a mistake]

That is a very helpful list. I don't think these examples quite produce an argument for adding new types in std (assuming we deprecate to_be() et. al.), but some of them look like great candidates for a "de facto standard" solution. I feel like there ought to be a WG to ping for this, but the Embedded WG doesn't seem quite right for any of it.

scottmcm · October 13, 2019, 9:16pm

I can't see this happening. repr(C) is mostly for FFI, where forcing the endianness to be specified doesn't make things any safer on your platform, and is just wrong if you ever want to run on a machine with the opposite endianness.

comex · October 13, 2019, 10:27pm

This is another example of why Rust needs computed properties.

...I know, I know. But think about it. It makes total sense to have a separate type; &foo.field cannot be an &u32. Yet, if you write:

let mut x = foo.field;

What happens if x gets the inferred type u32le? At best, it's annoying: nothing works with u32le, so you have to explicitly convert to regular u32 somewhere. At worst, there might be some performance overhead. Suppose that u32le decides to implement AddAssign<u32> for convenience's sake, so that this compiles:

x += 1;

It would be convenient, but it would have a hidden cost: the implementation would have to first convert to native endian, then do the addition, then convert back to fixed endian.

In any case, it doesn't make sense for x to be u32le. Semantically, foo.field is a u32 value that is stored as little-endian. There's no reason to think that a copy of that value in a local variable would also want to be stored as little-endian!

So the typical alternative (as suggested by @Nemo157) is to use getters and setters. Either foo.field() and foo.set_field(val), or foo.field.get() and foo.field.set(val). They could accept and return plain u32, so the little-endian aspect doesn't leak into surrounding code. Works great, but there are two problems:

They're ugly.
They don't work with patterns; you can't say e.g. let Foo { field } = get_foo();.

That's not the end of the world. If it were just fixed-endian integers that had these problems, I'd say "just deal with it". But there's also:

Cell and atomics
Bit fields
Custom data structure layouts in general (of which bit fields are a special case)

One thing these all have in common, along with fixed-endian integers (if compared to bincode), is that they can be used as performance optimizations. They're not things Rust should be penalizing in terms of ergonomics. But so far they are, because computed properties are considered too scary.

kornel · October 13, 2019, 11:44pm

struct.other_endian_field += 1 would be converting back and forth anyway (unless you have a bi-endian cpu and the compiler is very clever?). It would even make sense to do so if the struct is used to edit a memory-mapped file.

OTOH if you're going to copy the data out of the struct and process it elsewhere, then that sounds more like deserialization. Perhaps a better solution for that would be a bincode-like Serde library for C structs with an endian?

dwg · October 14, 2019, 1:14am

Ah, that's very interesting.

I did not know that, that sounds like a good idea.

Aaaahhh... I think I've got it now. So actually, I don't think the distinction between compound and simple types is particularly relevant here. But what I hadn't fully made sense of is that #repr is a property of a type, not a variable (and a structure field mostly behaves like a variable). So yes, I'm convinced that newtype wrappers are the right way to handle this.

So, you'll see above that I've accepted your conclusion that newtypes are the way to do this. But I'm going to be a bit pedantic about that statement. In most cases a u32 is an abstract concept; for example a u32 that never leaves registers has no endianness. The representation of the u32 when it is stored in memory is something defined by the per-platform ABI, with "native endian" being the convention. That's a little tautological though, since what constitutes "native endian" is really defined by the ABI as well.

Again, I accept the overall conclusion, but this is a false analogy. Signed versus unsigned is much more fundamental than representation - it affects what abstract values you can store in the type, and how operations like addition and multiplication work on it. Endianness does not.

comex:

Suppose that u32le decides to implement AddAssign<u32> for convenience's sake, so that this compiles:
x += 1;
It would be convenient, but it would have a hidden cost: the implementation would have to first convert to native endian, then do the addition, then convert back to fixed endian.

Performance considerations w.r.t. endian are generally misplaced. Assuming you always do the conversions at load/store time (a.k.a. the right way), the byte-swizzling is almost always of negligible cost compared to the load or store itself on even remotely modern hardware and it can usually be done concurrently with another load or store. In some cases it is truly zero cost: e.g. POWER has byte-reversing load store instructions which cost no more than the regular versions.

It doesn't need a particularly clever compiler. Remember that values in registers do not have endian, only values in memory. So struct.other_endian_field += 1 is simply

Other-endian load
Increment in the usual way
Other-endian store

For some CPUs those "other endian load/store" things become a single instruction, for the rest they're a regular load/store and some very cheap shuffling.

The packed_struct crate mentioned above is roughly this.

Ok.. cool. So, new proposal

We try to get u32::to_be() and the like deprecated. What's the first step in that process?
Try to add suitable endian newtype wrappers to byteorder (that being the de facto standard endian handling crate). I'll try to find time to write and post some patches for the crate.

dwg · October 14, 2019, 1:24am

I'm glad to hear it. Thought of a couple more:

Shared buffers for high performance RDMA style network libraries (e.g. MPI) need an agreed upon endianness to support heterogenous clusters (admittedly MPI things will usually be dealing mostly with FP, not integers).

And for a moderately common case of mixed endian in the same buffer:

It's pretty common for DMA buffers for NICs to have a prefix header with parameters for the NIC itself, followed by the "on the wire" data with ethernet and IP headers. The NIC header is usually little endian (because most current hardware works in LE, because x86 hegemony) but the network headers are mostly big endian (because ARPAnet history). There are cases (again mostly for high performance / low-latency cluster-internal type communication) where it's useful and/or convenient to have one big structure covering the whole lot.

comex · October 14, 2019, 1:46am

The idea is that you're just copying the value out and then trying to do some arithmetic on it. There is no reason that should involve any endian conversions past the point of copying, but if your local variable gets inferred to a fixed endian-type then you can get it anyway.

Not as efficient.

By the way, there are also file formats, such as most executable formats, that have both big-endian and little-endian variants. If you want to support both in the same program, you won't know the endian until runtime. So it may also be useful to consider adding "unknown endian" newtype wrappers that can convert to native from a runtime endian value.

You could implement this using fixed-endian types plus generics – something like:

if is_be {      
    foo::<u32be>();
} else {
    foo::<u32le>();
}

However, I prefer to avoid this, as the duplicated monomorphized code results in binary bloat.

HeroicKatora · October 14, 2019, 1:47am

It is maybe important to examine the reason for the existing complications in C++ (partially C), to understand why these languages do not have great solutions either and in what way Rust differs to enable better solutions. And also how Rust is faster in case you don't want to rely on implementation defined semantics. Let's show this with C++:

Casting a pointer to bytes/char into a pointer to any other type without using a placement-new constructor is totally UB in C++. You're allowed to memcpy bytes to initialize or copy the representation of very simple types (PODs and other trivial copiable structs) but you can not simply pointer cast! C++20 will reiterate this point by adding std::bit_cast which gives you an owned copy but static_cast or reinterpret_cast of a pointer is definitely still UB. It is however allowed to for a char * (and signed char* and unsigned char*) to point to and read the memory but not other integral types even unsigned variants and not the other way around¹.

const char* some_stream = ...;
// Will lead to UB in C++
const int* var = (const int*) &some_stream[valid_aligned_index];

In Rust however this is fine as long as the bytes are initialized.

let stream: &[u8] = ...;
assert_eq!(stream.as_ptr().align_offset(mem::align_of::<u32>()), 0);
let as_int: &u32 = unsafe { &*(stream[..4].as_ptr() as *const u32) };

Why is this important? Because the C++ and C also forbids the cast one representational struct denoting a big endian int into another that marks little endian int. You can only cast pointers to a trival struct's only field to the struct, or back.

struct BigEndian { int member; };
struct LittleEndian { int member; };

const int* foo = ...;
// Permitted afaik.
const BigEndian* bar = (const BigEndian*)foo;
// Very bad. Dereferencing this is UB in C++.
const LittleEndian* goose = (const LittleEndian*)bar;

In Rust however these wrappers are totally fine as long as their layout is explicitely defined. That makes it possible to explicitely define them such that structs that want to do DMA with other platforms can compose them for an explicit effect. To avoid all unsafe I would still advise a helper library that does all these basic cast implementations for you, otherwise it is easy to forget a check. Since we're talking about network buffers you may want to choose a wrapper struct that to remove alignment restrictions as well. At that point a real integer is intractable in either language. For example use:

#[repr(C)]
struct NetworkU32([u8; 4]);

impl NetworkU32 {
    fn read(&self) -> u32 {
        u32::from_be_bytes(self.0)
    }

    fn from_bytes(bytes: &[u8]) -> &Self {
        // An unsafe but non-UB implementation:
        unsafe { &*(bytes[..4].as_ptr() as *const [u8; 4] as *const Self) }
    }
}

luser · October 23, 2019, 2:16pm

As usual when this topic comes up I'd like to plug scroll, which solves this problem in a pretty usable way with a custom derive. The scroll docs aren't a great source of examples but I used it in my minidump crate with great success. Here's the minidump header struct definition, and here's the code that reads the minidump header (doing endian detection by looking at the magic number at the start of the file).

system · January 21, 2020, 2:31pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Post: byte ordered stream parsing libs	1	603	April 21, 2020
Pre-RFC enum from integer language design	24	13843	March 25, 2019
Different way to call into Rust from other languages	3	579	August 23, 2024
[Pre-RFC v2] Safe Transmute	32	6114	April 6, 2020
Restarting the `int/uint` Discussion internals	197	24979	March 13, 2015

U32::to_be() considered harmful, or how to encourage safe endian handling

Related topics