[pre-RFC] Using Sigstore for signing and verifying crates

lulf · January 10, 2023, 11:18am

Hi all!

I'd like to post this pre-RFC to get some more feedback. It is a proposal to use Sigstore (https://sigstore.dev) for signing crates on publish and verifying on download. The RFC is described in detail here: rust-rfcs/0000-sigstore-integration.md at sigstore-rfc · trustification/rust-rfcs · GitHub

Any feedback you can give here or directly is appreciated!

bascule · January 10, 2023, 2:11pm

Hi there! Several years ago I opened this crates.io issue on a security model, and let me just say it's very exciting to see proposed Sigstore integrations.

You mention TUF, and separately posit this open question:

Should crates.io use the signing information to enforce that the signature of crate-to-be-published matches the publisher?

I think it'd be great to start working on the machinery of Sigstore for AuthN / crate signing (and the workflow looks great), but it really needs an associated AuthZ policy to be useful, as this blog post highlights:

https://blog.sigstore.dev/signatus-ergo-securus-who-can-sign-what-with-tuf-and-sigstore-ea4d3d84b8b6

As it were, there are some early discussions happening on the Cargo team right now on potentially using TUF for things like (sparse) index signing.

You do mention TUF repeatedly in the pre-RFC, including the index signing work and various potential integrations. I guess what I think is missing here specifically a future work statement which describes how TUF could be used for AuthZ in a system where Sigstore is used for AuthN, answering the open question you posed about this, e.g. how TUF's delegated targets feature could provide AuthZ policies which specify which crates.io users (and their associated GitHub OIDC identities) are allowed to sign which particular crates.

tschuett · January 10, 2023, 4:08pm

The container people have basically the same problem.

github.com/aws/containers-roadmap

Image Signing Support in ECR

opened 02:27PM - 12 Dec 18 UTC

DrFaust92

ECR Work in Progress

### Community Note * Please vote on this issue by adding a 👍 [reaction](https…://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request * Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request * If you are interested in working on this issue or have submitted a pull request, please leave a comment  **Tell us about your request** Support for storing image signatures in ECR. **Which service(s) is this request for?** Storing container image signatures in ECR, verification of signatures in ECS/EKS. **Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?** Notary V1 is the currently available community-supported tool that would allow for signing and verifying OCI/Docker images, but it requires standing up a separate service and a lot of heavy lifting that each customer must do to setup and maintain. **Are you currently working around this issue?** N/A **Additional context** Update from ECR team (10/14): We are actively working on adding support for container image signing in ECR. To deliver this feature, AWS is participating in two parallel open source efforts to deliver support for storing signatures (and other related artifacts) in an OCI registry and performing signature validation in a container orchestrator so we can launch a solution that will be compatible across container orchestrators and OCI registries. * To enable the storage and discovery of “reference artifacts”, such as signatures, in an OCI registry, we are working with the ORAS project to define a new specification for [OCI Artifacts](https://github.com/oras-project/artifacts-spec). Last month, we were excited to announce an [initial draft release](https://github.com/oras-project/artifacts-spec/releases/tag/1.0.0-draft.1) for that project! * We are also participating in the design and development of [Notary V2](https://github.com/notaryproject/notaryproject), to define industry standards for signing and validating images that can be implemented in the tools used to build images today and container orchestrators like EKS & ECS. We will update this issue when we reach key milestones, but for an up to date picture of our progress, please take a look at the respective projects. We’re always looking for feedback and collaborators, so join us in the [oras-project/artifacts-spec](https://github.com/oras-project/artifacts-spec) & [notaryproject/notaryproject](https://github.com/notaryproject/notaryproject) repositories!

woodruffw · January 10, 2023, 8:16pm

Thanks for sharing this pre-RFC! I'm one of the maintainers of sigstore-python and I'm actively working with the Python community to get Sigstore integrated into PyPI, so I have some design feedback based on our experience there

Identity/scope of verification

Anybody can sign for anything in Sigstore, so every Sigstore verification operation needs to be performed modulo a specific identity (or set of identities). In particular, it isn't enough to just verify that the signature and certificate are valid and consistent with the transparency log, since mallory@evil.com can sign for serde's crates just as easily as alice@good.com can.

I think this is alluded to in the pre-RFC, with this language:

Email of Owner - retrieved as part of identity/GitHub lookup with email scope

However, this probably won't be sufficient on its own: emails are only one form of identity in Sigstore (GitHub Actions and other CI providers can also produce ephemeral OpenID connect identities, encoded as URLs). Moreover, just the identity string itself is not enough -- verification also requires establishing that the IdP attesting to the identity is the one expected (per If `--cert-email` is provided, `--cert-oidc-provider` should be required (verification) · Issue #1947 · sigstore/cosign · GitHub),

The latter problem might not be a significant issue for Rust, since it seems like GitHub is the intended MVP target and simply requiring all identities to come from https://github.com/login/oauth might be sufficient. But it's something to keep in mind for future expansion, and both should probably be addressed in the RFC to ensure that e.g. CI publishing workflows can also sign for crates without requiring manual signing operations

Offline verification/mirroring

At the moment, the pre-RFC specifies the following verification flow:

In this flow, Rekor (the transparency log) is accessed on each verification operation. This isn't a problem in terms of Rekor itself (it was designed for this!), but it might be a problem for (1) privacy conscious users, who wouldn't like the transparency log to know roughly when they're fetching dependencies, and (2) corporate users who might have firewall or other network rules that rewrite crates.io to a local mirror and forbid other network traffic.

Sigstore has accommodations for these use cases, in the form of offline Rekor bundles and Sigstore bundles, the latter being a replacement for the former.

When using a bundle instead of an online lookup, the threat model changes slightly (the verification entails verifying a signed promise of inclusion by the transparency log, rather than a proof of inclusion), but at the benefit of requiring no connection to the log whatsoever.

It probably makes sense for cargo to support a variant of this flow, possibly with some kind of --offline flag (that's what sigstore verify uses). When passed, cargo should additionally retrieve the Rekor bundle from the index, and use that rather than performing an online transparency log check.

(More generally, the need to special-case bundle retrieval will be obviated once Sigstore bundles are stabilized, since the index will only have to host one additional file rather than separate files for the cert, signature, and Rekor bundle. Once that happens, --offline could be as simple as using the embedded Rekor promise.)

Verifying on the index as well?

This is more speculative, but something (IMO) worth thinking about: in addition to having cargo verify signatures on client endpoints during package retrieval/install, it might be worth performing signature verification on the package index itself (i.e. crates.io).

Doing so has a few advantages:

Catching programmer/publisher errors earlier: if users can configure their packages as requiring signatures, the index and cargo publish can coordinate to reject uploads that don't include a signature. Similarly, if users can configure a trusted list of identities for their packages, the index and client can coordinate to reject uploads that are signed by the wrong identity.
Visual identity: lots of developers check crates.io for a package's links and statistics before downloading. Having a little green checkmark or similar can serve as a visual indicator that the package is signed, which users can then make value judgments about.

Once again: thanks so much for making this pre-RFC! I think Sigstore in Rust is extremely promising, and I'm excited to see what the community thinks

jhutchings1 · January 10, 2023, 9:29pm

Exciting to see this proposal outlined! Have you read through the RFC that we (GitHub) put together concerning the npm signing efforts? As @woodruffw suggests above, one of the design considerations we had there was around optimizing the system for publishing from trusted build systems where a CI system like GitHub Actions has OIDC claims about the workflow run because you can end up with signed claims as to the repository that authored and signed the package. We felt that those claims held more weight than just the email based claims.

I've flagged this for some of the folks from our team to have a look at the RFC as well. Thanks!

lulf · January 11, 2023, 9:23am

This is great feedback, thank you! I hadn't given enough thought to the OIDC given crates.io usage of GitHub already, but I think you're right it needs to be planned for. I will look into how this is done for other package managers.

The offline verification is a good point, and cargo does have an --offline flag as well that we could use for this perhaps. I need to play a bit more with Rekor bundles to see how it works.

lulf · January 11, 2023, 9:23am

I had not read through that, thanks for sharing the link! It does a very good job of outlining the reason for the change as well, and points out more things I hadn't thought about and where to go in the future with this work (build provenance looks very interesting). One question I have about the NPM RFC:

Is there a particular reason for opt-in signing? One concern I had when thinking about this was that only a few people with sign their crates if it was opt-in, but maybe it's better to start with opt-out?

Thanks for raising the point about GHA OIDC claims, will look at that, and for flagging it for others to look at.

mathstuf · January 11, 2023, 2:16pm

How does this work with multiple emails on an account? How is it known that my work email is for crates A, B, and C but my personal for X, Y, and Z? It appears that I do so as part of my publish or otherwise in some crates.io settings page?

woodruffw · January 11, 2023, 4:08pm

Yeah, this is something that would need to be defined and included in the final RFC: Sigstore itself is flexible in terms of allowing multiple identities (or even threshold schemes, although that's probably not desirable in this case), so the integration point needs to decide how/whether it allows that.

For something like crates.io, I can see a few reasonable approaches:

In crates.io's own account settings page(s), allow users to enable their verified email address as a signing identity. Supplementally, allow users to configure multiple verified email addresses, a la PyPI (although this might require larger changes to crates.io).
Add a package/crate configuration view for each uploaded crate, where crate owners can configure accepted signing identities. This allows multiple owners to coordinate on valid signing identities.
Develop a notion of "trusted metadata," e.g. for the package.repository and package.authors fields. In this case, these fields would be considered valid signing identities, under either a TOFU scheme (first upload to crates.io determines that they're trusted, subsequent changes need to be manually validated on the website) or a fully manual scheme (first upload to crates.io does not expose any associated signing materials until the user logs into the website and explicitly marks one or more identities as valid).

We're still working out the best choice (in terms of UX) for PyPI, so I expect that cargo and crates.io will be able to glean some decisions from what we end up doing .

woodruffw · January 11, 2023, 4:35pm

Related to the multiple identities consideration: integrating Sigstore also requires considerations around changes in repository URLs if GitHub Actions identities are allowed.

For example, say user foo moves crate pkg from github.com/foo/pkg.rs to github.com/pkg-org/pkg.rs. Do the signatures associated with GitHub Actions identities for foo/pkg.rs remain valid? The intuitive answer is "yes," but, if so, crates.io will need to keep additional time-range metadata for when an identity was considered trusted for a particular crate (or, perhaps more simply, copy the trusted metadata set to each release as it gets made). This is partially solved by the scheme described in option (3) in [pre-RFC] Using Sigstore for signing and verifying crates - #9 by woodruffw, since each release of each crate would have its own clone of the trusted metadata and new uploads would first have to be configured on crates.io to point to the new GitHub repository.

bascule · January 11, 2023, 5:41pm

FWIW, crates.io currently models a GitHub user as having a single email address, which I assume is the primary email associated with your account:

github.com

rust-lang/crates.io/blob/c640cea0da5cc065736c06d048805ee1b42007c0/src/github.rs#L174


      
                       GitHub org memberships.",
                  ),
                  Some(Status::NOT_FOUND) => not_found(),
                  _ => internal(format!("didn't get a 200 result from github: {error}")),
              }
          }
          
          #[derive(Debug, Deserialize)]
          pub struct GithubUser {
              pub avatar_url: Option<String>,
              pub email: Option<String>,
              pub id: i32,
              pub login: String,
              pub name: Option<String>,
          }
          
          #[derive(Debug, Deserialize)]
          pub struct GitHubOrganization {
              pub id: i32, // unique GH id (needed for membership queries)
              pub avatar_url: Option<String>,
          }

For Sigstore purposes, it needs to be whatever is in the OIDC claims, where I would guess the primary email is also used.

For proper isolation between personal access and work access, I think it's generally a good practice to have two accounts, rather than a personal account with a secondary associated work email.

CAD97 · January 11, 2023, 5:55pm

It's perhaps not a threat vector worth addressing, but consider that a repository no longer being used by a crate could be later controlled by an unrelated actor. (e.g. the old individual or org account was deleted and then reclaimed after the holding period.)

It's certainly useful to ensure that a crate is published from the repository it claims to be, but any validation should first be predicted on that the uploading identity is validated to be one of the identities with publishing permission at the time of upload.

(This is of course guaranteed by the cratesio token, but the signature validation is a stronger guarantee and should not require trusting the weaker guarantee to be meaningful.)

cratesio supports giving publishing permission to organization identities, so this is I think a relatively simple thing; a) track historical changes to publishing permission, and b) only validate repository origin if the repository is controlled by an allowed publisher at the time of publish.

(Though this presumes this is only done for repository hosts which are also recognized identity providers for cratesio login, and I have no idea if it extends to if not.)

jhutchings1 · January 11, 2023, 6:05pm

Great question. I think long term, it's best if every package is signed, but we want to ensure that the technology has had sufficient time to mature before asking every developer to use it for every package upload.

woodruffw · January 11, 2023, 6:26pm

Absolutely, thank you for highlighting this! This is something I've given some thought to with other OIDC integrations unrelated to Sigstore: GitHubs OIDC tokens include the underlying user/org and repository IDs as claims, which should change when a repository name or org name is "taken over" by a different entity.

Those claims aren't currently exposed by Sigstore (specifically Fulcio), but adding them shouldn't be difficult. Then, when added, they could be used as part of the scheme to prevent signatures that have the correct org/repo slug but with an underlying change in ownership.

This is the route we're going with PyPI too, although thing to think about is whether to allow "downgrades," i.e. should a crate that's started doing signed uploads be allowed to upload unsigned ones? If so, an attacker could simply strip the signature during upload and installing clients would have no recourse.

mathstuf · January 11, 2023, 6:48pm

GitHub ToS doesn't allow this:

197g · January 11, 2023, 6:52pm

Sign the generated .crate file using the private key

This is sufficient as part of demonstrating authenticity of the archive itself. I'm wondering if it could be valuable to attest also for a more concrete transaction: that some party intends to upload some specific binary to some specific registry. Transactions initiated by an authorized worker¹ intended for private registry A may be unsuitable for registry B. This signature for this doesn't need to be publicly available but should be verified by crates.io if a policy was configured for it. The signature could, for instance, by specific to Github's OpenID token that identifies the exact CI run with which the packaging/upload has been performed; so we know that job was authenticated instead of to the generic email. I believe that could feasibly be associated data in the CSR.

¹ See prior comment about authorization. Sigstore only demonstrates that the worker is authentic. If the authentication was as specific as a singular CI job then the problem of authorizing need also be less generic. This could help defer some of the authorization problems. The ball would effectively be in CI provider's court to provide some mechanism to authorize their worker jobs via a cert chain tracing back to the maintainer / to TUF. But I'm only speculating here and haven't thought the precise cryptography through.

woodruffw · January 11, 2023, 7:08pm

I could be wrong, but I believe the implication of having a "work associated" GitHub account is that it's a paid account, in which case the free account limit does not apply.

mathstuf · January 11, 2023, 7:27pm

We do FOSS for the most part, so paying for an account is really unnecessary (I get added to customer paid orgs at times, but that's no different than if I had a paid account AFAIK). I at least have no need for a paid account (other than this restriction).

bascule · January 11, 2023, 7:59pm

You mean to say that GitHub doesn't allow multiple free accounts.

GitHub encourages this for enterprises and even offers first-class features for using enterprise identity systems for SSO with automatic user provisioning:

Eh2406 · January 11, 2023, 10:24pm

For reference there is also a conversation going on at https://rust-lang.zulipchat.com/#narrow/stream/246057-t-cargo/topic/Cargo.20and.20signing.2Fverification about this PR-RFC

One of my first comments in that thread was:

Designs in this space will at least involve the Infra Team, the Cargo Team, and the crates.io Team. (As I have messed up before, excluding any relevant team in the design process can come back to bite you later.) Also none of these teams have an excess of design/implementation bandwidth. So this is probably an effort that can be measured in years.

I deeply appreciate the input from the maintainers of other repositories. Given that are deliberative process is likely to make us late adopters we should learn the lessons from your lived experience.

I've been trying to articulate a more profound question about this RFC and I have not yet found the right words. But it is strongly related to @CAD97 excellent comment:

In some future ideal system where we are using TUF, SigStore, Asymmetric Tokens, OIDC ... and eight more things that haven't been invented yet.

If the public transparency / code signing side let anyone see that in upload was malicious, then crates.io should use that same information to prevent the upload in the first place.
The transparency log should explain why crates.io made any trust decision that affects the artifact I am about to download/use. I should not only be able to verify that the artifact I'm downloading was uploaded by a particular user, but also when the token they used was created, why we thought they were authorized to make the token, who authorized them to be an owner on the crate... All the way back to "the crates.io administrator granted special access" or "we trusted git hub identity 1234 to publish this package because they were the first to the name".
Putting those two together, if crates.io definitely enforces the verification available in the transparency log and the transparency log is sufficient to justify the enforcement the crates.io does, then there is no need for any other kind of authentication/authorization. As the two systems are mathmaticly identical.

Fundamentally, this is one of the key insights behind my design in Asymmetric Tokens. If signing the hash of the artifact I'm about to upload (if made public) would be enough for a third party to verify the authenticity of the upload, then let's just use that signature as the authentication token. To take things one step at a time that RFC did not discuss making the tokens public, or for that matter how crates.io will implement the RFC at all. It was definitely intended from the start! To quote the RFC:

After that an audit log of what tokens were used to publish on crates.io and why that token was trusted, would probably be a rich data source for identifying compromised accounts. As well as making it possible to do end to end signature verification. The crate file I downloaded matches the cksum in the index; the index matches the cksum in the audit log; the public key used in the audit log is the one I expected.

and

Furthermore, this RFC attempts to make a start on solving several problems at the same time. It may be that in time we discover these problems need to be solved separately. If we end up with a separate system for code signing and a separate system for authorization, then a simpler more direct method of authentication might have been a better choice.

Asymmetric Tokens have not stabilized. It would be unfortunate if in the future we end up stuck with two systems for signing every action, a required public transparency signature and a parallel Asymmetric Token that is only used for some crates.io checks. If Asymmetric Tokens need to change to avoid that duality, let's figure this out now.

Topic		Replies	Views
Vendor lock-in	11	1072	March 25, 2019
[Pre-RFC]: Author Attached Crates-io Names cargo	20	1995	December 22, 2024
[Pre-RFC] Securer Tokens (Private keys for Cargo) cargo	11	1385	December 22, 2024
Security fence for crates tools and infrastructure	50	3167	March 25, 2019
Pre-eRFC: Crate name transfer policy	19	2778	March 25, 2019

[pre-RFC] Using Sigstore for signing and verifying crates

Identity/scope of verification

Offline verification/mirroring

Verifying on the index as well?

Related topics