Request for CVE post-mortems

gnzlbg · May 15, 2019, 3:45pm

The last two CVEs announcement boil down to: CVE was introduced here, fixed there, these are the affected versions.

This reads like the CVE was fixed, but were we to make the exact same errors for a different feature today we would end up with another new CVE tomorrow. For me, this does not feel fixed at all.

When something is stabilized, the stabilization PR is just the last small bit of incremental work that leads to something landing on stable Rust. Before that happens, there are many tiny incremental steps were tiny errors can accumulate. None of these tiny errors looks critical, so it is easy to let them slip one at a time, but their sum is what results in a CVE landing on stable Rust.

I wish that, after the announcement is done, the work would continue towards fixing these CVEs by writing a post-mortem that identifies all the process failures were errors slipped in, how they resulted in the CVE actually landing, and how could we change our process to make it impossible for another CVE to land in the language due to the same reasons.

jethrogb · May 15, 2019, 7:52pm

In particular, we’ve had two recent cases where something was stabilized and then there was basically an immediate issue with it that required a point release. Is there something we can change in our process to prevent this from happening again in the future?

glandium · May 16, 2019, 5:46am

Relatedly, in both cases, I’ve found the balloons and party poppers in “This week in rust” to be highly inappropriate. e.g. https://this-week-in-rust.org/blog/2019/05/14/this-week-in-rust-286/

nasa42 · May 16, 2019, 5:03pm

(I’m one of the editors at TWiR)

I’d argue that those emojis are to celebrate the fix not the bug, and also because it has been a long running convention to decorate all Rust release announcements regardless of the subject of release.

While I don’t find them inappropriate, but if community thinks otherwise, I’m happy to skip them from point/security releases.

newpavlov · May 16, 2019, 5:10pm

I think the main issue is insufficient attention and testing which most Nightly features get nowadays. This is one of the reasons why I've proposed semi-stabilization (a.k.a beta-features):

Centril · May 16, 2019, 5:27pm

We actually removed "The Rust team is happy to announce" in the blog post this point release because we didn't feel we were happy to announce at all.

RalfJung · May 16, 2019, 6:48pm

Which other case do you mean? I feel there have actually not been many point releases recently -- 1.34 is the first version this year to see any.

jethrogb · May 16, 2019, 7:27pm

I guess I was referring to 1.26.2, which was released about a year ago. Not sure if that qualifies as “recent” ¯\_(ツ)_/¯

vorner · May 16, 2019, 7:33pm

From my point of view, the overall quality of Rust ‒ including the low amount of bugs and needed point releases is very good. Furthermore, they seem to be dealt with quite fast and without pretending. I mean, would other software/language go so far as request a CVE number? They tend to say that if you wrote obviously wrong code (eg. claiming the type is something else than it is, as was the case here) and your program doesn’t work as intended, well, it’s your problem…

Nevertheless, I’d like to see some kind of post-mortem or a blog „how we got there“. Not because it’s CVE, but because it’s sharing of experience. I like to learn both from how things worked for someone but also how something didn’t work. If there’s an interesting bug, a feature that eventually didn’t land or similar (from technical or from process point) and someone knows how it happened, sharing that would be appreciated :-). Obviously, it can be the case that it’s a very boring case of „Nobody thought of that“ and that’s it. Bugs simply happen.

Centril · May 16, 2019, 8:54pm

My take on a TL;DR for this specific soundness hole as a post-mortem is:

Someone neglected to leave a comment on #[unstable(...)] #[doc(hidden)] + unsafe { ... }.
Someone thought it would be good to stabilize.
Questions were asked about why it was #[doc(hidden)] and we didn’t really know.

I think solving step 1. would have been a good step to take:

Always leave comments on unsafe { ... } and related places.
- No matter how trivial.

sfackler · May 16, 2019, 9:27pm

There is no unsafe block involved with Error::type_id. The API has been unsound since it was first added in 2015.

cuviper · May 16, 2019, 9:47pm

I suppose that statement is hedging on unsafe and related places – the unsafe code in downcast_ref and downcast_mut depends on is<T>, which assumes a well-behaved type_id.

Uther · May 16, 2019, 9:48pm

If I remember correcly, the abort for panic in FFI was reverted recently too.

sfackler · May 16, 2019, 9:51pm

Sure, but no one noticed that this was a problem until now! The initial PR had type_id flagged unstable because we were "unsure if we wanted to commit to the interface", not because "this is unsound and we're using stability gates to hack around that problem".

The FFI abort change was reverted because too much real world code was relying on the undefined behavior of unwinding through extern "C" functions.

pietroalbini · May 16, 2019, 10:05pm

If I remember correctly this was reverted before it reached stable though.

gnzlbg · May 16, 2019, 10:08pm

I also noticed the following:

not all check boxes were checked - this is kind of normal, and don’t know if more checkboxes would have helped here, probably not
no summary comment for stabilization (the lang team has these) - e.g. it is unclear to me which problem this API tries to solve, whether it is worth solving, and how, etc. The T-libs team doesn’t appear to have these. Might have helped.
the FCP was not mentioned in any TWiR issues - not that TWiR is part of the process (it isn’t), but some issues receive more eyes and reviewers after being “announced” in other channels, which did not happen for this one

None of these feel like major issues, but the cause of this might be the sum of many tiny issues.

Obviously, a comment explaining that the API was unstable and hidden because it was unsound would have prevented this, but one person adds an internal API, doesn’t add a comment / doesn’t realize it is unsound, another one makes it public+unstable, doesn’t realize it either, and 4 years later, we kind of end up here anyways. We can force people to write comments, but we can’t force the comments to be correct.

Centril · May 16, 2019, 10:18pm

I do think we should enforce a comment requirement with tidy on all unsafe fn and unsafe { ... } both in the standard library as well as in the compiler. We can't force the comments to be correct, but we can reduce chances with reviews.

bill_myers · May 16, 2019, 11:28pm

Why is there a CVE in the first place?

It seems like this is only a problem for maliciously written code, and malicious code can already use unsafe as well as opening /proc/self/mem and probably other soundness holes; in general Rust is not designed to protect against malicious code, just accidentally broken code.

cuviper · May 17, 2019, 12:40am

The point is that this allowed safe code to have unsafe effects, whether or not it's accidental.

cuviper · May 17, 2019, 12:56am

However, I am surprised that NIST scored it so high with CWE-119, “Improper Restriction of Operations within the Bounds of a Memory Buffer.”

For comparison, Red Hat Product Security marked it low with CWE-676, “Use of Potentially Dangerous Function,” which seems more fitting to me.

Topic		Replies	Views
Concerned about Rust 2018 stability	17	4735	March 25, 2019
Idea: Semi-stabilization language design	37	4537	July 7, 2019
1.27.1 prerelease testing announcements	6	1283	March 25, 2019
Announcing State Of Rust announcements	9	8760	March 25, 2019
Possible stabilizations for 2018 Edition Preview 2 announcements	9	5670	March 25, 2019

Request for CVE post-mortems

Related topics