API to acquire arbitrarily initialized buffer?

AndreKR · October 15, 2020, 7:41pm

I just came across this RFC:

It's about the problem that for example the Read trait is

fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>

and you have to needlessly initialize buf before you can pass it to read().

From what I understand the whole problem arises from the fact that the compiler treats uninitialized memory not as arbitrary values but as "undefined" values and reading from such "undefined memory" is undefined behavior, so for example if u: i32 is such an undefined variable, the compiler is allowed to assume false for u >= 0 || u <= 0. I'm assuming there are very good reasons for this peculiar behavior, probably to do with optimization.

This is why you can't just initialize a buffer like

let mut buf: [u8; 1024] = unsafe { MaybeUninit::uninit().assume_init() };

If buf is not actually overwritten by the Read implementation, it is very unsafe to work with, because of the problematic compiler behavior mentioned above.

As proposed solutions I saw some rather complicated APIs with custom Read-like traits to track partially uninitialized memory and such.

But why? Why not simply solve the problem at hand by providing an API that, instead of providing an array of delicate "undefined" memory, provides an array of arbitrarily initialized memory?

matklad · October 15, 2020, 8:58pm

I think the two often-cited reasons are

MADV_FREE: https://www.man7.org/linux/man-pages/man2/madvise.2.html. On Linux, reading from non-initialized memory twice can legitimately return different values without any compiler hackery
reading arbitrary-initialized memory can leak secrets. That was the essence of cloud bleed bug, for example.

EDIT: RFC links to this discussion of this alternative: https://paper.dropbox.com/doc/IO-Buffer-Initialization-MvytTgjIOTNpJAS6Mvw38#:uid=144819543418690591087193&h2=std::ptr::freeze

scottmcm · October 15, 2020, 11:01pm

In addition to the things given above, that wasn't possible in LLVM until recently -- still isn't in all the versions of LLVM that rustc supports. It would have been the same thing as providing an array of zeros.

atagunov · October 16, 2020, 12:12pm

Wow! How are they dealing with MADV_FREE on those version that support it?

scottmcm · October 16, 2020, 5:40pm

I don't know. You can read about the new LLVM IR instruction here: LLVM Language Reference Manual — LLVM 18.0.0git documentation

RalfJung · October 17, 2020, 10:39am

This blog post has some more details.

This old RFC also lists some good reasons for not having such an operation in the language (it did not get accepted, but such an operation was not added so far either):

Even in latest LLVM we only have freeze on IR values AFAIK, not on regions of memory. That doesn't really help for the read usecase.

atagunov · October 17, 2020, 4:54pm

Hmm.. would you think the language might benefit from this new 8-bit type?

mu8 // maybe-uninit byte

Every time the value is assigned to a regular u8 or otherwise used the read is wrapped into freeze at LLVM level. Both in safe and unsafe code.

Further considerations: an unsafe transmute from mu8 to u8 and from &mu8 to &u8, etc. would probably be useful. mu8 could perhaps be used in some places where normal u8 is used now. For example you could have mu8-typed local variables on the stack. I imagine code using mu8 instead of u8 would benefit from fewer LLVM optimizations. However I also imagine that for the cases mu8 is intended for it wouldn't cause significant performance impact. MU variants for other integer types - and generally for types for which all bit patterns are valid - could be considered as well.

mjbshaw · October 17, 2020, 5:01pm

This is a situation where it would be nice if Rust had a way to specify write-only references. This would be useful for other write-only situations (e.g., writing to GPIO pin, writing to some GPU memory, etc.). But I think it's too late to introduce new data or reference types without a backwards compatible way to retrofit them onto existing APIs (like Read).

RalfJung · October 17, 2020, 5:22pm

Even with write-only references, things would be unsafe as we would have to ensure that when the function returns some n: usize, it actually initialized n bytes.

That sounds pretty hard to do, Rust does not really have facilities to special-case a particular type when doing assignments. How do you imagine this to work when doing MIR optimizations on some generic type T where we don't know if there is a mu8 in there?

atagunov · October 17, 2020, 5:58pm

allow assignig mu8 to mu8 - no freeze
allow assigning mu8 to u8 - with a freeze - and only if both types are known
a generic T can only be assigned to a variable of type T exactly; if T is mu8 it remains mu8, no freeze, if T is a struct with mu8 inside then the type of that field remains mu8, no freeze

mu8 is a shorthand for MaybeUninit<u8>
mu8 -> u8 assignment is sugar for a fn freeze(mu : MaybeUninit<u8>) -> u8 intrinsic

scottmcm · October 18, 2020, 12:18am

Assignment always being just a memcpy is a rather fundamental thing in Rust. I don't think this is nearly important enough to be a place to change that.

Similarly, I don't think this meets the bar for a new type. One can always make ones own type mu8 = MaybeUninit<u8>; if it's something common.

DDOtten · October 18, 2020, 8:45am

I agree that MaybeUninit<u8> seems to niche to get its own name and that much sugar. However I really like the idea of such a freeze(mu : MaybeUninit<u8>) -> u8 function.

RalfJung · October 18, 2020, 9:20am

I was not asking about how to compile this, I was asking about optimizations and analyses that work on generic MIR. We do not know if T is mu8 when optimizing generic MIR. So with this we'd have to pessimize optimizations in all generic functions because there might be a mu8 there.

Oh and of course the concerns from this RFC that I already mentioned still apply:

atagunov · October 18, 2020, 6:07pm

Hmm.. is there really a problem though? When compiling a = b

if we know a : u8 and b : mu8 we insert freeze intrinsic into MIR
if we know a : T and b : U the code simply does not compile as T may differ from U
otherwise a : T and b : T and a memcpy is sufficient regardless of T being mu8 or not

In other words fn freeze(mu : MaybeUninit<u8>) -> u8 instrinsic is inserted before MIR optimization pass and no unnecessary pessimisation seems to be necessary.

Maybe there is a misunderstanding? My suggestion as stated makes the language exactly as expressive as adding fn freeze(mu : MaybeUninit<u8>) -> u8 intrinsic would.

The reason I suggested this intrinsic and mu8 -> u8 assignment syntactic sugar in one package is because I feel the sugar is both vastly more ergonomic and also more mentally stimulating. Thinking of this in terms of mu8 -> u8 assignment in my view prompts a richer set of further generalizations than just freeze fn on its own.

RalfJung · October 18, 2020, 6:43pm

I am not talking about compiling it.

I am talking about doing program analysis for the purpose of optimizations. Like, given some let a = b, can we replace a later foo(a) by foo(b)? With mu8 we cannot because a is frozen but b might not be.

Your proposal goes way beyond that in forcing automatic freezing in a bunch of places. This needs to be taken into account in all stages of the compiler, in particular optimizations. That will be very non-trivial, so your proposal has a much higher cost than "just" adding explicit freeze. I am not convinced that extra cost is worth the added benefit.

(I am not even commit that we want an explicit freeze but that is a separate story.)

atagunov · October 18, 2020, 9:08pm

I might be clueless but.. Compiling let a = b can end up with one of the following two results:

a : u8, b : mu8 => this ends up compiled into MIR as if it was a = freeze(b)
in all other cases same behavior as today treating mu8 say as a shorthand for MaybeUninit<u8>

The idea was all these places are detected during compilation and freeze intrinsic is inserted into MIR in each such case. As a first cut freeze can probably be treated as a sort of function by the optimizer/other places in the compiler and that hopefully may require little or no change at all.

Are there that many places? Assigning mu8 to u8 - it's effectively a new type conversion. Use of mu8 in all kinds of expressions, like a < b or a + 1, a & 0xF. That is all I guess? In all these cases it would be conductive to eliminating UB to insert a freeze.

The choice of traits that would be implemented for mu8 would need to be very carefully considered. I don't see much harm in initially implementing none. Though perhaps all/most could be implemented with such implementations using either an explicit invocation of freeze or an implicit one via an assignment to u8.

RalfJung · October 19, 2020, 10:30am

Oh, I somehow missed that a: u8 here. So in generic code I could not even enter the first case.

Basically, to use Rust terminology, you are suggesting to add a coercion from mu8 to u8 that performs a freeze. Sorry for misunderstanding.

In that case I do not see a reason for why that should be an implicit coercion, we might as well do

type mu8 = MaybeUninit<u8>;

impl mu8 {
  fn freeze(self) -> u8 { /* ... */ }
}

I do not think saving 9 characters for some very specific unsafe code is worth the extra coercion, and also this is a subtle operation that really should be performed explicitly rather than implicitly.

system · January 17, 2021, 10:30am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reading into uninitialized buffers, yet again	11	3002	January 26, 2021
ReadBuf as part of Rust edition 2021 libs	11	1692	June 13, 2021
Pre-RFC: Read::read_into_uninitialized libs	15	2309	March 25, 2019
Safely reading uninitialized memory	25	3014	March 25, 2019
Uninitialized memory	57	10187	March 25, 2019

API to acquire arbitrarily initialized buffer?

Related topics