Uninitialized memory

Yes, this seems like a quite good way out of this quandary. Essentially, at some future date it will be possible for libraries to gain efficiency by swapping around which default method implementations they use, but nothing breaks in the process. The main question this leaves in my mind is naming: should we reserve read (the "main" method of the trait) for what you're calling read_buf? It's a bit hard to say, because the ideal endpoint will depend on what kind of "write only" buffer abstraction we wind up with (and whether it's easily usable in cases that are happy to use read).

So this is the other part of the question for me. In the initial RFC I talked about "exposing" uninit data to "safe code", but that's far from a formal rule. Should this have the status of a guideline (like our other API guidelines?) or actually be part of the contract for unsafe (as I originally proposed)? At the moment, I lean more toward API guideline, in part because I'm not confident we can formalize a crisp contract without unintended consequences. But maybe with enough iterations in this thread we can get there.

On the other hand, if it's just a guideline, can it ever be relied upon by security-sensitive code? Would we have to meticulously document all the cases where we are not providing access to uninit memory in safe code? Several people have argued in this thread and the RFC thread that if people can't rely on some kind of guarantee about uninit memory, they will be forced to zero various buffers themselves.

1 Like

On the other hand, if it's just a guideline, can it ever be relied upon by security-sensitive code?

Secure code can't rely on anything.. It must all be audited no matter what. The goal is to do as good of a job for "normal" uses. In rust, it should be impossible to segfault without using an unsafe block. Obviously, bugs in the compiler / std / etc... can cause segfaults, but those are bugs. The same should be true with accessing uninitialized memory. If rust code can read uninitialized memory w/o an unsafe block, then it is a bug in a library that the code is using, not the code itself.

1 Like

So, I feel like we can make a hard and fast rule for the standard library that it will not expose (return or pass to a callback) uninitialized values; therefore, we do not need to comment every method repeating this fact. Code that really cares about security can trust that rule or choose to audit the standard library. Other code is not under our control, so whatever text we rewrite about contracts is only partially relevant anyhow, the secure library will have to audit this other code to see what rules it provides.

Regarding the unsafe contract, when we do write that up, it will have to at least talk somewhat about uninitialized memory. The job of the "unsafe contract" is basically to ensure that the things that the compiler permits safe code to do cannot lead to undefined behavior. Reading uninitialized memory -- which is something safe code can do -- can certainly provoke undefined behavior under some circumstances. So at minimum unsafe code must do whatever it has to do to ensure that the reads (in safe code) are considered "defined" behavior. I am not 100% certain of the details on this point and want to read up a bit more. I think in practice though what this will mean is that libraries should never return or expose uninitialized memory except via the return of an unsafe fn.

Does that make sense?

To be clear, this principle is exactly what's under debate, both in terms of how to formulate it and whether it is considered a guideline or part of the basic safety contract. As I've been trying to emphasize, my original formulation in terms of the basic contract around unsafe turned out to be problematic in various ways (as you can see in the original RFC thread). It's not clear to me whether you're advocating for a guideline or an extension to "undefined behavior" in Rust.

Yes, I think so. This suggests the following:

Guideline: code must ensure that uninitialized byte data is only read within the scope of an unsafe block. In practice, this means that code that produces uninitialized data cannot:

  1. Yield that data as a return value, unless the function itself is unsafe, or
  2. Pass that data to any unknown code (i.e. via generics or trait objects), unless the code being passed to is marked unsafe.

The standard library is required to follow this guideline, while other libraries are strongly encouraged to do so.

UPDATE: changed the guideline to say "data is only read* instead of "can only be read".

I would personally like to take a step back on this thread and discuss the point of whether we should consider this behavior unsafe at all for a bit. I think the strategy laid out by @nikomatsakis and others is reasonable if we want to not expose uninitialized memory. This is very good to consider because we know we have a contingency plan if we make this decision to recover performance losses in the future as well as having clear design guidelines for the standard library.

I do not personally agree that exposing an uninitialized byte array should be considered unsafe behavior. Note that today in the case of read_to_end it does not trigger undefined behavior in LLVM so code is still well defined and even if you read the memory it will have defined behavior (just weird results).

  • I don’t feel like it’s feasible to impose a performance hit and API-design burden on all of Rust that buffers like this should always be initialized for the sake of protecting sensitive data. If memory contains sensitive data you don’t want leaked to other parts of the process, then it should be the owner’s responsibility to zero it out before allowing it to be reallocated.
  • Exposing buffers like this should of course always be documented as not having any guarantees about the contents, and it really is just an error if you try to read it. In that sense if you start seeing valgrind errors (which would be false positives), it is reasonable to expect the offending code to be modified to not read the memory.

Put another way, it seems like there is a defacto decision to strongly recommend against uninitialized buffers in safe Rust and I just want to make sure that we settle on that point specifically.

This sounds good to me. I'm not sure if it makes sense to include the caveat for other libraries are not. If we do, I'd probably want to add something like "Note that if this rule is not followed, caution and care are required to avoid undefined behavior". (But this of course does not address Alex's points, which I just saw.)

I VERY STRONGLY believe that there should be no way for safe rust code to be able to read from uninitialized memory. I would expect that Rust makes a best effort to protect me from doing this.

Accidentally reading from uninitialized memory is very easy and have the potential of having very serious implications (see heart bleed, but it is probably one of main vulnerability vectors). If Rust decides that it is ok to leak uninitialized memory into safe Rust code, I would consider that a serious hit against the advantages that Rust provides.

`

First of all, it's not an and situation, it is an or situation. Either Rust provides a buffer abstraction that has essentially 0 performance penalty or it zeroes out memory, which has a small penalty. Also, we are specifically discussing functions like Read::read_to_end() which already are not good functions to be used in a performance sensitive situation. In a performance critical situation, one would allocate a buffer once (zero it out) and reuse it, making the point of zeroing out of memory having overhead moot . (Yes, you may say that a bug would expose contents previously written to this buffer, and this is true, but it WILL not expose memory that was never written to the buffer).

Also, I would not consider a buffer abstraction "API overhead" It makes working with bytes easier and is a pretty established abstraction.

Now, the exact same argument could be made against the borrow checker. Which means that once one requires cycles, EITHER a performance hit is needed (Arc / Rc / using a vec and passing around indexes) OR unsafe code is needed. This is what we buy in to when we are using Rust.

This is definitely a strong burden. For example, such code should presumably not use Vec. (How would Vec know to zero out the old buffer when it reallocates?) Perhaps it is reasonable that security-sensitive code would wind up with its own set of abstractions. (In particular I don't know how much of an additional burden this would be on top of whatever add'l requirements security-sensitive code already has.)

I'm torn. You make a strong case but ultimately I have this feeling like this is the sort of footgun that Rust exists to prevent. It just doesn't feel like the stdlib should be in the business of exposing uninitialized memory without an unsafe keyword. Moreover, it feels like we can eventually have a more precise type here that clarifies the role of the buffer as something you write into vs something you read from, which seems like it must be better.

2 Likes

If a method of a different trait has the name of the newly added function, then name resolving will fail if an objects implements both traits, both are in scope and the function is called.

This is very true. A concrete example is mio. I plan on adding a read_buf function (see https://github.com/carllerche/mio/blob/mio-reform/src/io.rs#L40). If Rust adds Read::read_buf, it would definitely break any code that uses mio + Read (which would be most mio code)

Sorry, I should have been more clear. We've been meaning to write up an RFC/guidelines documenting what kinds of changes are permitted, and the plan was to permit adding methods to traits. There is some risk of ambiguity arising, yes, which can be resolved by clients in a trivial way using UFCS. This would be one of a small number of "permitted" compilation failures that can accompany a minor release. (Similar hazards arise in other languages as well.) Otherwise our ability to grow APIs would be very hampered.

I would personally like to keep concerns as concrete as possible. For example I think it is unfair to say that this is akin to heartbleed as I am under the impression that it resulted as a result of an out-of-bounds access, not reading uninitialized data (please correct me if I'm wrong though).

You say that you strongly believe that Rust should protect you from doing this, but can you clarify why? What cases are you worried about?

I think this all hinges on your later statement that the buffer abstraction is not "API overhead". I personally disagree because we do not have a write-only buffer and it would take time to stabilize and rationalize with existing read methods (aka what @aturon and @nikomatsakis have been talking about). This to me definitely seems somewhat of an overhead as it must be considered.

I see the borrow checker as quite distinct from this. The borrow checker is preventing me from segfaulting. Initializing a byte buffer does not prevent me from segfaulting.

I have not personally written much security-sensitive code, but I would expect a ZeroVec<T> which is largely what Vec<T> is today except that Drop is implemented differently by dropping each individual element and then zeroing out the entire allocation to ensure all data is erased (or perhaps a ZeroBox<T> abstraction as well). Again though, I do not personally know if this is sufficient (and would be curious if others know whether it would be).

I personally find it hard to buy into this though without concrete evidence saying that we shouldn't (e.g. justifying the 20-25% loss in performance for read_to_end). I, too, am somewhat uneasy about it but I am unable to convince myself that we're wrong for exposing uninitialized byte buffers.

While Heartbleed was an out-of bounds buffer access, if I recall correctly. But if a Rust server has some kind of vulnerability with uninitialized buffers, it would be potentially just as bad.

I don’t believe it’s possible to do this kind of thing in Java, for example.

Right, that's what I was talking about. Probably security-sensitive code would want to practice security in depth, meaning that they would want to use ZeroVec and not have safe code being given access to uninitialized data.

The reason why I compared it to heart bleed is that you probably heard of it AND the outcome of reading from uninitialized memory is for all intents and purposes the same.

"Security-sensitive code" is not a small subset of code. Any bit of code that is potentially exposed to an attacker is "security-sensitive." This includes a significant chunk of code that is written (and virtually all server code). A "read from uninitialized memory" bug can be exploited to leak secret keys, etc...

For example, writing a server app that returns signed cookies? An attacker could exploit an uninitialized memory read bug to access the secret key and then be able to get full access to the system. The list goes on.

The buffer API can be punted post 1.0. Zeroing memory in the read_to_end / read_to_string functions are acceptable trade-offs for 1.0. The performance hit is not huge and those two functions in question should not be used in performance sensitive situations anyway.

Honestly, I have always thought that Rust would have protected me against reading from uninitialized memory, and the fact that this is in question is (still) very surprising to me given how bad the outcome of such bugs can be.

1 Like

Seriously people! I know that you have promised, but it’s not 1.0 yet. Just break backwards compatibility!

+1 for the buffer object. It allows for more optimizations than slices.

2 Likes

Is there any contention about the point that the problem can be solved performantly and safely with new abstractions like &out, or even just library types in the existing language as suggested by @carllerche and @reem? Because if that’s the case, then the choices appear to boil down to:

  1. Some use cases written in a certain way will be somewhat slower out of gate from 1.0, up until we add those abstractions (which we can make our first priority post-1.0), and after that point they can be both fast and safe.

  2. They will be fast right from the start, but Rust’s safety and security guarantees will be weaker forever.

In other words, a temporary performance hit versus a permanent safety/security hit. This doesn’t feel like a difficult decision to me.

(This is not about the meaning of the words “safe” and “secure” in a boolean sense, like where exactly you would draw the line - rather about stronger vs. weaker guarantees, on a continuum.)

5 Likes

I'm inclined to agree. Perhaps a compromise is simply to say that we intend to enforce the "no leaking access to uninitialized memory" guideline so long as we can, but we'll possibly revisit this question in the future as new APIs arise. In judicial terms, basically we set a narrow precedent here, since it is clear we can solve this case but perhaps there are other cases that will be less clear that we are not thinking of.

Well, a guideline would require that an API that leaks uninitialized memory is marked as unsafe.

1 Like

I’m basically just restating what’s been already said in the thread, but -

The thing about reading uninitialized memory is that it only takes one such read to potentially get at any freed data, which breaks what is otherwise a promise that avoids a whole class of bugs in safe code - the promise that freeing data makes it go away. For some especially sensitive pieces of data, such as passwords, it is advisable to zero out the memory after use anyway, as existing C applications do, because the kernel might somehow leak that memory to other processes; but it is impractical to do this for all data which might be valuable to an attacker (which in networked applications could be just about anything - any private data stored on behalf of users), and attacks against the kernel are usually local only, so in general that attack scenario is far less scary than direct leaks by the application. Heartbleed is quite relevant whether or not that specific bug was caused by reading from an uninitialized buffer: it is an example of how dangerous information disclosure can be even if it’s not in the traditional ‘arbitrary code execution’ vulnerability category.

I don’t think the overhead of stabilizing a buffer API post 1.0, and temporary slowdown before that, is very significant compared to that risk. I agree that a narrow precedent would be best, but I can’t actually think of any scenarios, where uninitialized memory access is truly required to avoid a significant usability or performance overhead, that aren’t already highly unsafe; can anyone else?

1 Like