The last two CVEs announcement boil down to: CVE was introduced here, fixed there, these are the affected versions.
This reads like the CVE was fixed, but were we to make the exact same errors for a different feature today we would end up with another new CVE tomorrow. For me, this does not feel fixed at all.
When something is stabilized, the stabilization PR is just the last small bit of incremental work that leads to something landing on stable Rust. Before that happens, there are many tiny incremental steps were tiny errors can accumulate. None of these tiny errors looks critical, so it is easy to let them slip one at a time, but their sum is what results in a CVE landing on stable Rust.
I wish that, after the announcement is done, the work would continue towards fixing these CVEs by writing a post-mortem that identifies all the process failures were errors slipped in, how they resulted in the CVE actually landing, and how could we change our process to make it impossible for another CVE to land in the language due to the same reasons.
In particular, weāve had two recent cases where something was stabilized and then there was basically an immediate issue with it that required a point release. Is there something we can change in our process to prevent this from happening again in the future?
Iād argue that those emojis are to celebrate the fix not the bug, and also because it has been a long running convention to decorate all Rust release announcements regardless of the subject of release.
While I donāt find them inappropriate, but if community thinks otherwise, Iām happy to skip them from point/security releases.
I think the main issue is insufficient attention and testing which most Nightly features get nowadays. This is one of the reasons why I've proposed semi-stabilization (a.k.a beta-features):
From my point of view, the overall quality of Rust ā including the low amount of bugs and needed point releases is very good. Furthermore, they seem to be dealt with quite fast and without pretending. I mean, would other software/language go so far as request a CVE number? They tend to say that if you wrote obviously wrong code (eg. claiming the type is something else than it is, as was the case here) and your program doesnāt work as intended, well, itās your problemā¦
Nevertheless, Iād like to see some kind of post-mortem or a blog āhow we got thereā. Not because itās CVE, but because itās sharing of experience. I like to learn both from how things worked for someone but also how something didnāt work. If thereās an interesting bug, a feature that eventually didnāt land or similar (from technical or from process point) and someone knows how it happened, sharing that would be appreciated :-). Obviously, it can be the case that itās a very boring case of āNobody thought of thatā and thatās it. Bugs simply happen.
I suppose that statement is hedging on unsafeand related places ā the unsafe code in downcast_ref and downcast_mut depends on is<T>, which assumes a well-behaved type_id.
Sure, but no one noticed that this was a problem until now! The initial PR had type_id flagged unstable because we were "unsure if we wanted to commit to the interface", not because "this is unsound and we're using stability gates to hack around that problem".
The FFI abort change was reverted because too much real world code was relying on the undefined behavior of unwinding through extern "C" functions.
not all check boxes were checked - this is kind of normal, and donāt know if more checkboxes would have helped here, probably not
no summary comment for stabilization (the lang team has these) - e.g. it is unclear to me which problem this API tries to solve, whether it is worth solving, and how, etc. The T-libs team doesnāt appear to have these. Might have helped.
the FCP was not mentioned in any TWiR issues - not that TWiR is part of the process (it isnāt), but some issues receive more eyes and reviewers after being āannouncedā in other channels, which did not happen for this one
None of these feel like major issues, but the cause of this might be the sum of many tiny issues.
Obviously, a comment explaining that the API was unstable and hidden because it was unsound would have prevented this, but one person adds an internal API, doesnāt add a comment / doesnāt realize it is unsound, another one makes it public+unstable, doesnāt realize it either, and 4 years later, we kind of end up here anyways. We can force people to write comments, but we canāt force the comments to be correct.
I do think we should enforce a comment requirement with tidy on all unsafe fn and unsafe { ... } both in the standard library as well as in the compiler. We can't force the comments to be correct, but we can reduce chances with reviews.
It seems like this is only a problem for maliciously written code, and malicious code can already use unsafe as well as opening /proc/self/mem and probably other soundness holes; in general Rust is not designed to protect against malicious code, just accidentally broken code.