Crate capability lists


#1

In recent years it became popular in programming language communities to share large amount of code using various package managers. Rust is no exception, it is very easy to share your own code and use code written by others by using Cargo. At first sight, this seems to be a good thing. As time passes, more and more package will be available, eventually providing everything that someone may need in an every day software project. However, as shown by the community of Node.js, this solution is hardly perfect. If an average software depends on thousands of 3rd party packages, auditing and trusting the software becomes a very tricky problem.

Due to the amount of code to be verified and the speed of development, manually auditing hundreds or thousands of packages on every update is simply not feasible. I believe this issue needs tooling support.

I propose the creation of a tool that processes a library crate and generates the following meta data:

  • list of all modules/functions used from Rust standard library by the given crate
  • list of all extern function calls (all “C” functions)
  • list of all dependent crates and functions called from the dependencies of the given crate
  • whether the crate uses unsafe blocks
  • whether the crate is using a build script

This list becomes the list of capabilities needed to execute the code of the given crate. The list could also be available for each version on crates.io.

Every time someone compiles a rust crate, the user can save the full capability requirements of all dependencies. This list can be used to verify/check that after updating a set of dependencies, no new capability is required for any of them. I.e. a seemingly innocent base64 encoder is not sending all data to a shady webserver.

If we squint hard enough, this problem is analogous to trusting a mobile application on your mobile phone. A mobile application is not supposed to be able to do everything it wants to, the user can check the list of permissions required for that app before installing the application. Later when the application is updated, the user is given a chance to accept/block the changes in permissions. I believe something similar must be done for crates.

It is important to guarantee that if this tool is not complaining about a dependency update then that update must be perfectly safe (or at least as safe as the previous one?). If the tool is complaining, the user’s responsibility to check/verify the update and accept it.

Is this idea viable?

I intentionally ignored the safety of build scripts (i.e. will the build script steal my data?). I believe the trusting issue of build scripts is even more complicated and cannot be solved using this mechanism.

Some of the related discussions:


[Pre-RFC] Cargo Safety Rails
#2

I’ve been thinking along similar lines, but specifically around the problem of restricting usage of unsafe. I posted some initial thoughts here:

https://groups.google.com/d/msg/cap-talk/t9al5hjN19U/XzHfR1peBAAJ

I’ve thought about posting a “Pre-Pre-RFC” about this, but I guess I can start by spitballing here.

Unsafe Features

Synopsis: Extend the existing idea of cargo features with a special notion of “unsafe features” which can be used to whitelist usage of unsafe in dependencies (and their transitive dependencies).

Goals:

  • Opt-in feature which has no effect on code that doesn’t make use of the feature explicitly
  • Cover all usages of unsafe, including std
  • Provide a path for retrofitting std with unsafe features, which could eventually be used to eliminate ambient authority from Rust and thereby provide a foundation for an object capability model

Non-Goals:

  • Full OCap semantics out of the box
  • Breaking changes of any kind

Cargo.toml changes

Let’s start with something which is perhaps worthy of a pre-pre-RFC in and of itself, an allow-unsafe option:

[dependencies]
foobar = { version = "0.2", allow-unsafe = false }

This would transitively disallow use of unsafe by foobar and all of foobar's transitive dependencies.

Ideally though, we could whitelist specific usages of unsafe in the foobar crate. Enter “unsafe features”.

In the Cargo.toml for the foobar crate, we could imagine something like this:

[features]
unsafe = ["fire_the_missiles"]

And in the code, something like this:

#[cfg(unsafe_feature = "fire_the_missiles")]
fn fire_the_missiles(...) {
   unsafe {
      // Use your imagination. How about some pointer arithmetic based on attacker-controlled data?
      [...]
   }
}

Now let’s imagine the baz crate wants to consume the foobar crate and use this feature. It will need to opt into using this unsafe behavior (I am imaging that unsafe features cannot be default features, and must be opted into explicitly).

In the Cargo.toml for the baz crate, we do the following:

[dependencies]
foobar = { version = "0.2", unsafe-features = ["fire_the_missiles"] }

Now the baz crate is able to make use of the fire_the_missiles function. We can imagine that all of the rest of the cargo feature behavior continues to work, for example foobar could be an optional dependency, pulled in via a regular (non-unsafe) cargo feature.

But for simplicity’s sake (and to illustrate an example), let’s suppose that baz always includes the foobar crate and makes use of the foobar/fire_the_missiles unsafe feature. Now what happens when the quux crate tries to include baz?

In the Cargo.toml for the quux crate, imagine we did this:

[dependencies]
baz = "0.1"

This is where things get a bit interesting. baz is making use of the foobar/fire_the_missiles feature, and quux has not explicitly authorized it. This is an error:

error: crate `quux` makes use of `unsafe-feature` not whitelisted in Cargo.toml: foobar/fire_the_missiles`

…or thereabouts.

To correct this, we need to explicitly whitelist this relationship in the quux crate’s Cargo.toml:

[dependencies]
baz = { version = "0.1", unsafe-features = ["foobar/fire_the_missiles"] }

Explicitly whitelisting these features would be required at any level.

Imagine we’re in a completely different project which is including the quux crate. It would be required to do the following in its Cargo.toml

[dependencies]
quux = { version = "0.0.1", unsafe-features = ["baz/foobar/fire_the_missiles"] }

Tough questions

I stated one of the goals is “opt-in feature which has no effect on code that doesn’t make use of the feature explicitly”. In my spitball description, I’m suggesting whitelisting of unsafe usages “kicks in” when allow-unsafe = false or unsafe-features is added to a crate’s attributes in the [dependencies] section. But what about:

  • Q1: Usages of unsafe which aren’t tagged with #[cfg(unsafe_feature)]?
  • Q2: Usages of unsafe in std

Well, short answers:

  • A1: When unsafe-features is used, all usages of unsafe MUST be gated on a #[cfg(unsafe_feature)]. If that were the case, it would probably make sense to make these usages compile errors.
  • A2: It should probably apply to std, possibly with an opt-out mechanism for truly side-effect and ambient authority-free things like mutating the interiors of strings.

This means to get the unsafe parts of std “back” in these crates and make them accessible, std itself would need to be retrofitted with unsafe-features.

A quick summary of the philosophy here:

  • Some crates export unsafe-features
  • Other crates consume them, explicitly whitelisting the ones they use
  • At each level of the dependency hierarchy, crates must opt into these features, or these uses of unsafe will be a compile error

An idea to mitigate attacks through malicious crates
An idea to mitigate attacks through malicious crates
#3

I think it’s an interesting idea to try to name the unsafe blocks but my solution is not focused on them. They are part of the solution but not in the center. I simply want a quick (and automatic) way to describe what a crate is doing at the high level. If it is not using any library (nothing from std) and not using unsafe blocks then I am sure that it will not steal my data or put evil stuff on my disk. If it is using std::fs:: or std::net then I am a little more sceptical. If it is using unsafe and calling c functions then all bets are off. I really need to check the code in that case (or delegate the audit to someone else).

By the way, we also need a method to somehow decide whether a given unsafe block was changed. If I checked version 1.2 and found it safe and this automatic checker is not complaining about new unsafe blocks that doesn’t mean that the author did not put something evil in the already existing unsafe blocks.


#4

When considering capability lists et. al; also think about const fn or other restrictions that can be made. If I know that a crate can only have const fns or const items, then cannot possibly do any side effects which therefore rules out networking and file I/O.

For some prior art there’s SafeHaskell: https://dl.acm.org/citation.cfm?id=2364524


#5

you can write perfectly safe code that defines a global malloc byte array that contains machine code and have that be called instead of the real malloc


#6

You cannot do that without using unsafe. If you are using unsafe then obviously anything is possible.


#7

You can do that without unsafe. static malloc: [u8; N] = ...;


#8

Oh, you are talking about tricking the linker into calling a buffer… Hmm, that must be detected by this tool, too.


#9

Correct me if I’m wrong, but “unsafe features”, if applied to std, would cover all of these cases?

I thought this was an interesting solution to that particular problem:


#10

You need #[no_mangle], but if I add that…

[1]    39916 segmentation fault  cargo run
Process 39946 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)

…fun times.


An idea to mitigate attacks through malicious crates
#11

Anyone happen to know why the symbol clash above doesn’t cause a linker error?

One way to deal with that in the context of my unsafe-features proposal: require #[cfg_attr(unsafe_feature = "x", no_mangle)] in order to disable name mangling (i.e. all no_mangles must be tied to a corresponding unsafe-feature)


#12

It’s not an error to have repeated symbols - the linker will just take the first one found, at least on unix.

Isn’t that static in a no-exec mapping though? I expect that’s the reason for your segfault, even if you did put valid machine code in the byte array.


#13

there’s probably an attribute to make it executable


#14

I think that depends on a specific symbol resolution strategy, as I’ve certainly dealt with my fair share of “ld: duplicate symbol […]” errors in the past


#15

I suspect this doesn’t work in practice. My impression is that a very large portion of unsafe code is either for FFI (where turning it off isn’t even an option) or for optimization (which you would typically want on by default).

In particular, there’s a very strong risk of any audit framework/tooling like this unintentionally leading to crate authors being discouraged from using any unsafe code for optimization, even when the soundness of that unsafe code is uncontroversial. In past discussions this concern has sometimes been expressed as “demonising unsafe code”. Unfortunately I have no good ideas on how to prevent this, but I do think that we’d be causing more harm than good if we introduced a system that did have this problem in practice (if nothing else, it risks encouraging the idea that security is at odds with performance).

Pedantic but important: We’d want to check the entire module for changes, for every module containing at least one unsafe.


#16

Yes, I think you are right, but that approach has a granularity that is not practical in my opinion.

If the user has to whitelist all unsafe blocks in a transitive way, which means that everything that uses Vec or String or similar safe abstractions is affected then I think that would be affect way too many code that cannot really cause harm. I think my approach is better because we only consider the top, public interface of std and we assume that the internal structure of std is irrelevant, safe and harmless.

If there is then we can check for that attribute in the checker. I think that’s okay.

Care to elaborate why you think the module is the right boundary? I thought about this and I think it depends on the unsafe block. Each unsafe block is supposed to be wrapped in a safe abstraction, in can be a module or a single function. I think the tool is supposed to check whether the code that provides the safe abstraction is changed. The safe abstraction is supposed to be written in a way that its users cannot use it incorrectly so the initial audit has to check that this assumption is true then the change checker ‘only’ has to check that all code the provides the safe abstraction is still unchanged. I don’t think it is possible to figure out the boundary of safe abstraction in an automatic way so the initial auditor should probably provide this as the result of the audit.


#17

It is true that const fn cannot do IO (at the moment) but the same is true for a function that does not call any external function or IO functions from std. I don’t see what we gain with the const fn restriction in this situation.


#18

Oh, I thought this was an uncontroversial point the community already had a clear consensus on.

Let’s make it a bit more precise. The soundness of an unsafe code block is at most affected by the entire module it appears in, and there’s no way to automatically determine when something smaller like a single function happens to be sufficient.

So yes, we could have a feature where the initial auditor states there’s no need to re-audit the whole module. But I doubt it’d be worthwhile in practice because

  • unsafe code authors probably are and should be keeping the modules with unsafe code as small as feasible to make this stuff easier to reason about anyway, and
  • the kind of unsafe code that is clearly sound without checking the whole module is typically a good candidate for moving into a separate crate like std or rayon or pin_utils or bytes or whatever
  • auditing a module with unsafe code against all possible future changes to the safe parts is a much bigger ask than auditing the current state of the module (is that even possible for any real-world examples?)

#19

To forbid code from doing IO via requiring const fn is a more bullet-proof and tractable analysis that won’t go wrong as compared to the complexity of forbidding IO by other means. This is almost as conservative as you get (save for preventing non-termination attacks) and lots of algorithms shouldn’t require more than const fn.


#20

compile-time IO should be a thing.