At 11/07/2016 10:55 PM UTC
, the SSL certificate for crates.io expired, and was down for approximately one hour and 26 minutes. We're very sorry about the disruption here.
What happened
We host crates.io with Heroku, and use the Expedited SSL addon to manage the certificate. @alexcrichton got an email for renewal of the certificates on October 19, and followed the steps to renew the certificate. We checked in to see if it had been updated multiple times, as of last week, but they weren't, but forgot to check again. This wasn't a hack, all of the data is safe, this was purely an operational error.
This led to a general outage of crates.io, which affected people's cargo builds when fetching new crates. It also led to our CI system not being able to fetch crates. which failed all of the outstanding builds.
Timeline of events
We noticed at 10:57PM, three minutes after it expired.
We filed an expedited support ticket, and Heroku responded within ten minutes. Unfortunately, since this was a problem with an addon, they forwarded us to ExpeditedSSL's support, and couldn't guarantee when they'd get back to us.
At 11:20PM, after not hearing back, @aturon gave the go-ahead to just buy another certificate, in the hopes that this would let us get back up faster. Nine minutes later, @alexcrichton completed the process with DigiCert, and got the message
Your CSR has been submitted. We will update the order as soon as possible and contact you if there is anything else we need to issue the certificate.
@alexcrichton immediately got on the phone with support, but there was a problem: In order to get our new certificate, we had to have an affiliation with a company. Now, Mozilla is of course a company, but they already have an account with DigiCert, and since we aren't the people affiliated with the account, they would not let us use Mozilla.
The call was elevated to a supervisor, and @brson tried to get in touch with ExpeditedSSL.
In the meantime, @alexcrichton decided to re-issue the certificate, but for him personally, in the hopes (again) that this would be resolved quickly. But at 11:49PM, another error:
You do not have permission to manage sni endpoints on crates-io.
After looking at things and talking to support...
We received your renew cert but we cannot perform the install as it seems that you have a bad SSL Endpoint. Please remove the SSL Endpoint, log out of Heroku and wait about 5 minutes then go back in. Re-add the the SSL Endpoint and let us know so that we can retry the install. This may also require you to update your DNS settings as well.
In the background, @brson was also trying to get in touch with Mozilla people to possibly get a certificate through their account.
At 12:21 PM, @alexcrichton managed to get the new certificate installed and updated the DNS. Everything was then working again, modulo the differences in time it can take DNS to propagate.
We'll be continuing to monitor what's going on, please let us know in this thread if you are having more problems.
Steps in the future
We will be figuring out how to make this better in the future, but it's not totally 100% clear what should be done. We had made this an automated process in order to not have these kinds of issues crop up, but it obviously failed in this case. It's possible that switching providers can help here.