[pre-RFC] Using Sigstore for signing and verifying crates

Hi all!

I'd like to post this pre-RFC to get some more feedback. It is a proposal to use Sigstore (https://sigstore.dev) for signing crates on publish and verifying on download. The RFC is described in detail here: rust-rfcs/0000-sigstore-integration.md at sigstore-rfc · trustification/rust-rfcs · GitHub

Any feedback you can give here or directly is appreciated!

9 Likes

Hi there! Several years ago I opened this crates.io issue on a security model, and let me just say it's very exciting to see proposed Sigstore integrations.

You mention TUF, and separately posit this open question:

Should crates.io use the signing information to enforce that the signature of crate-to-be-published matches the publisher?

I think it'd be great to start working on the machinery of Sigstore for AuthN / crate signing (and the workflow looks great), but it really needs an associated AuthZ policy to be useful, as this blog post highlights:

https://blog.sigstore.dev/signatus-ergo-securus-who-can-sign-what-with-tuf-and-sigstore-ea4d3d84b8b6

As it were, there are some early discussions happening on the Cargo team right now on potentially using TUF for things like (sparse) index signing.

You do mention TUF repeatedly in the pre-RFC, including the index signing work and various potential integrations. I guess what I think is missing here specifically a future work statement which describes how TUF could be used for AuthZ in a system where Sigstore is used for AuthN, answering the open question you posed about this, e.g. how TUF's delegated targets feature could provide AuthZ policies which specify which crates.io users (and their associated GitHub OIDC identities) are allowed to sign which particular crates.

6 Likes

The container people have basically the same problem.

Thanks for sharing this pre-RFC! I'm one of the maintainers of sigstore-python and I'm actively working with the Python community to get Sigstore integrated into PyPI, so I have some design feedback based on our experience there :slightly_smiling_face:

Identity/scope of verification

Anybody can sign for anything in Sigstore, so every Sigstore verification operation needs to be performed modulo a specific identity (or set of identities). In particular, it isn't enough to just verify that the signature and certificate are valid and consistent with the transparency log, since mallory@evil.com can sign for serde's crates just as easily as alice@good.com can.

I think this is alluded to in the pre-RFC, with this language:

  • Email of Owner - retrieved as part of identity/GitHub lookup with email scope

However, this probably won't be sufficient on its own: emails are only one form of identity in Sigstore (GitHub Actions and other CI providers can also produce ephemeral OpenID connect identities, encoded as URLs). Moreover, just the identity string itself is not enough -- verification also requires establishing that the IdP attesting to the identity is the one expected (per If `--cert-email` is provided, `--cert-oidc-provider` should be required (verification) · Issue #1947 · sigstore/cosign · GitHub),

The latter problem might not be a significant issue for Rust, since it seems like GitHub is the intended MVP target and simply requiring all identities to come from https://github.com/login/oauth might be sufficient. But it's something to keep in mind for future expansion, and both should probably be addressed in the RFC to ensure that e.g. CI publishing workflows can also sign for crates without requiring manual signing operations :slightly_smiling_face:

Offline verification/mirroring

At the moment, the pre-RFC specifies the following verification flow:

image

In this flow, Rekor (the transparency log) is accessed on each verification operation. This isn't a problem in terms of Rekor itself (it was designed for this!), but it might be a problem for (1) privacy conscious users, who wouldn't like the transparency log to know roughly when they're fetching dependencies, and (2) corporate users who might have firewall or other network rules that rewrite crates.io to a local mirror and forbid other network traffic.

Sigstore has accommodations for these use cases, in the form of offline Rekor bundles and Sigstore bundles, the latter being a replacement for the former.

When using a bundle instead of an online lookup, the threat model changes slightly (the verification entails verifying a signed promise of inclusion by the transparency log, rather than a proof of inclusion), but at the benefit of requiring no connection to the log whatsoever.

It probably makes sense for cargo to support a variant of this flow, possibly with some kind of --offline flag (that's what sigstore verify uses). When passed, cargo should additionally retrieve the Rekor bundle from the index, and use that rather than performing an online transparency log check.

(More generally, the need to special-case bundle retrieval will be obviated once Sigstore bundles are stabilized, since the index will only have to host one additional file rather than separate files for the cert, signature, and Rekor bundle. Once that happens, --offline could be as simple as using the embedded Rekor promise.)

Verifying on the index as well?

This is more speculative, but something (IMO) worth thinking about: in addition to having cargo verify signatures on client endpoints during package retrieval/install, it might be worth performing signature verification on the package index itself (i.e. crates.io).

Doing so has a few advantages:

  1. Catching programmer/publisher errors earlier: if users can configure their packages as requiring signatures, the index and cargo publish can coordinate to reject uploads that don't include a signature. Similarly, if users can configure a trusted list of identities for their packages, the index and client can coordinate to reject uploads that are signed by the wrong identity.
  2. Visual identity: lots of developers check crates.io for a package's links and statistics before downloading. Having a little green checkmark or similar can serve as a visual indicator that the package is signed, which users can then make value judgments about.

Once again: thanks so much for making this pre-RFC! I think Sigstore in Rust is extremely promising, and I'm excited to see what the community thinks :slightly_smiling_face:

9 Likes

Exciting to see this proposal outlined! Have you read through the RFC that we (GitHub) put together concerning the npm signing efforts? As @woodruffw suggests above, one of the design considerations we had there was around optimizing the system for publishing from trusted build systems where a CI system like GitHub Actions has OIDC claims about the workflow run because you can end up with signed claims as to the repository that authored and signed the package. We felt that those claims held more weight than just the email based claims.

I've flagged this for some of the folks from our team to have a look at the RFC as well. Thanks!

3 Likes

This is great feedback, thank you! I hadn't given enough thought to the OIDC given crates.io usage of GitHub already, but I think you're right it needs to be planned for. I will look into how this is done for other package managers.

The offline verification is a good point, and cargo does have an --offline flag as well that we could use for this perhaps. I need to play a bit more with Rekor bundles to see how it works.

I had not read through that, thanks for sharing the link! It does a very good job of outlining the reason for the change as well, and points out more things I hadn't thought about and where to go in the future with this work (build provenance looks very interesting). One question I have about the NPM RFC:

  • Is there a particular reason for opt-in signing? One concern I had when thinking about this was that only a few people with sign their crates if it was opt-in, but maybe it's better to start with opt-out?

Thanks for raising the point about GHA OIDC claims, will look at that, and for flagging it for others to look at.

How does this work with multiple emails on an account? How is it known that my work email is for crates A, B, and C but my personal for X, Y, and Z? It appears that I do so as part of my publish or otherwise in some crates.io settings page?

Yeah, this is something that would need to be defined and included in the final RFC: Sigstore itself is flexible in terms of allowing multiple identities (or even threshold schemes, although that's probably not desirable in this case), so the integration point needs to decide how/whether it allows that.

For something like crates.io, I can see a few reasonable approaches:

  1. In crates.io's own account settings page(s), allow users to enable their verified email address as a signing identity. Supplementally, allow users to configure multiple verified email addresses, a la PyPI (although this might require larger changes to crates.io).
  2. Add a package/crate configuration view for each uploaded crate, where crate owners can configure accepted signing identities. This allows multiple owners to coordinate on valid signing identities.
  3. Develop a notion of "trusted metadata," e.g. for the package.repository and package.authors fields. In this case, these fields would be considered valid signing identities, under either a TOFU scheme (first upload to crates.io determines that they're trusted, subsequent changes need to be manually validated on the website) or a fully manual scheme (first upload to crates.io does not expose any associated signing materials until the user logs into the website and explicitly marks one or more identities as valid).

We're still working out the best choice (in terms of UX) for PyPI, so I expect that cargo and crates.io will be able to glean some decisions from what we end up doing :slightly_smiling_face:.

Related to the multiple identities consideration: integrating Sigstore also requires considerations around changes in repository URLs if GitHub Actions identities are allowed.

For example, say user foo moves crate pkg from github.com/foo/pkg.rs to github.com/pkg-org/pkg.rs. Do the signatures associated with GitHub Actions identities for foo/pkg.rs remain valid? The intuitive answer is "yes," but, if so, crates.io will need to keep additional time-range metadata for when an identity was considered trusted for a particular crate (or, perhaps more simply, copy the trusted metadata set to each release as it gets made). This is partially solved by the scheme described in option (3) in [pre-RFC] Using Sigstore for signing and verifying crates - #9 by woodruffw, since each release of each crate would have its own clone of the trusted metadata and new uploads would first have to be configured on crates.io to point to the new GitHub repository.

FWIW, crates.io currently models a GitHub user as having a single email address, which I assume is the primary email associated with your account:

For Sigstore purposes, it needs to be whatever is in the OIDC claims, where I would guess the primary email is also used.

For proper isolation between personal access and work access, I think it's generally a good practice to have two accounts, rather than a personal account with a secondary associated work email.

1 Like

It's perhaps not a threat vector worth addressing, but consider that a repository no longer being used by a crate could be later controlled by an unrelated actor. (e.g. the old individual or org account was deleted and then reclaimed after the holding period.)

It's certainly useful to ensure that a crate is published from the repository it claims to be, but any validation should first be predicted on that the uploading identity is validated to be one of the identities with publishing permission at the time of upload.

(This is of course guaranteed by the cratesio token, but the signature validation is a stronger guarantee and should not require trusting the weaker guarantee to be meaningful.)

cratesio supports giving publishing permission to organization identities, so this is I think a relatively simple thing; a) track historical changes to publishing permission, and b) only validate repository origin if the repository is controlled by an allowed publisher at the time of publish.

(Though this presumes this is only done for repository hosts which are also recognized identity providers for cratesio login, and I have no idea if it extends to if not.)

1 Like

Great question. I think long term, it's best if every package is signed, but we want to ensure that the technology has had sufficient time to mature before asking every developer to use it for every package upload.

Absolutely, thank you for highlighting this! This is something I've given some thought to with other OIDC integrations unrelated to Sigstore: GitHubs OIDC tokens include the underlying user/org and repository IDs as claims, which should change when a repository name or org name is "taken over" by a different entity.

Those claims aren't currently exposed by Sigstore (specifically Fulcio), but adding them shouldn't be difficult. Then, when added, they could be used as part of the scheme to prevent signatures that have the correct org/repo slug but with an underlying change in ownership.

This is the route we're going with PyPI too, although thing to think about is whether to allow "downgrades," i.e. should a crate that's started doing signed uploads be allowed to upload unsigned ones? If so, an attacker could simply strip the signature during upload and installing clients would have no recourse.

GitHub ToS doesn't allow this:

  1. Sign the generated .crate file using the private key

This is sufficient as part of demonstrating authenticity of the archive itself. I'm wondering if it could be valuable to attest also for a more concrete transaction: that some party intends to upload some specific binary to some specific registry. Transactions initiated by an authorized worker¹ intended for private registry A may be unsuitable for registry B. This signature for this doesn't need to be publicly available but should be verified by crates.io if a policy was configured for it. The signature could, for instance, by specific to Github's OpenID token that identifies the exact CI run with which the packaging/upload has been performed; so we know that job was authenticated instead of to the generic email. I believe that could feasibly be associated data in the CSR.


¹ See prior comment about authorization. Sigstore only demonstrates that the worker is authentic. If the authentication was as specific as a singular CI job then the problem of authorizing need also be less generic. This could help defer some of the authorization problems. The ball would effectively be in CI provider's court to provide some mechanism to authorize their worker jobs via a cert chain tracing back to the maintainer / to TUF. But I'm only speculating here and haven't thought the precise cryptography through.

2 Likes

I could be wrong, but I believe the implication of having a "work associated" GitHub account is that it's a paid account, in which case the free account limit does not apply.

We do FOSS for the most part, so paying for an account is really unnecessary (I get added to customer paid orgs at times, but that's no different than if I had a paid account AFAIK). I at least have no need for a paid account (other than this restriction).

You mean to say that GitHub doesn't allow multiple free accounts.

GitHub encourages this for enterprises and even offers first-class features for using enterprise identity systems for SSO with automatic user provisioning:

1 Like

For reference there is also a conversation going on at https://rust-lang.zulipchat.com/#narrow/stream/246057-t-cargo/topic/Cargo.20and.20signing.2Fverification about this PR-RFC

One of my first comments in that thread was:

Designs in this space will at least involve the Infra Team, the Cargo Team, and the crates.io Team. (As I have messed up before, excluding any relevant team in the design process can come back to bite you later.) Also none of these teams have an excess of design/implementation bandwidth. So this is probably an effort that can be measured in years.

I deeply appreciate the input from the maintainers of other repositories. Given that are deliberative process is likely to make us late adopters we should learn the lessons from your lived experience.

I've been trying to articulate a more profound question about this RFC and I have not yet found the right words. But it is strongly related to @CAD97 excellent comment:

In some future ideal system where we are using TUF, SigStore, Asymmetric Tokens, OIDC ... and eight more things that haven't been invented yet.

  • If the public transparency / code signing side let anyone see that in upload was malicious, then crates.io should use that same information to prevent the upload in the first place.
  • The transparency log should explain why crates.io made any trust decision that affects the artifact I am about to download/use. I should not only be able to verify that the artifact I'm downloading was uploaded by a particular user, but also when the token they used was created, why we thought they were authorized to make the token, who authorized them to be an owner on the crate... All the way back to "the crates.io administrator granted special access" or "we trusted git hub identity 1234 to publish this package because they were the first to the name".
  • Putting those two together, if crates.io definitely enforces the verification available in the transparency log and the transparency log is sufficient to justify the enforcement the crates.io does, then there is no need for any other kind of authentication/authorization. As the two systems are mathmaticly identical.

Fundamentally, this is one of the key insights behind my design in Asymmetric Tokens. If signing the hash of the artifact I'm about to upload (if made public) would be enough for a third party to verify the authenticity of the upload, then let's just use that signature as the authentication token. To take things one step at a time that RFC did not discuss making the tokens public, or for that matter how crates.io will implement the RFC at all. It was definitely intended from the start! To quote the RFC:

After that an audit log of what tokens were used to publish on crates.io and why that token was trusted, would probably be a rich data source for identifying compromised accounts. As well as making it possible to do end to end signature verification. The crate file I downloaded matches the cksum in the index; the index matches the cksum in the audit log; the public key used in the audit log is the one I expected.

and

Furthermore, this RFC attempts to make a start on solving several problems at the same time. It may be that in time we discover these problems need to be solved separately. If we end up with a separate system for code signing and a separate system for authorization, then a simpler more direct method of authentication might have been a better choice.

Asymmetric Tokens have not stabilized. It would be unfortunate if in the future we end up stuck with two systems for signing every action, a required public transparency signature and a parallel Asymmetric Token that is only used for some crates.io checks. If Asymmetric Tokens need to change to avoid that duality, let's figure this out now.

2 Likes