Rustc dependency policy

So, thanks to all the hard work on ./x.py, we have the ability to pull in crates from crates.io and elsewhere. This is wonderful. However, we currently impose some pretty minimal requirements on said crates. As far as I know, we just check that they have suitable licenses.

Recently, rayon encountered some problems due to its use of compiletest-rs, which turns out to be depending (on windows) on the dbghelp library. This is because of miri which uses backtrace. This strikes me as a kind of warning: our set of dependencies is rapidly growing and I don’t know that we are paying any sort of attention. This increases brittleness.

I am wondering if we should consider a whitelist on the transitive crates we employ. I feel like adding a new dependency to rustc should be a “more momentous” (but not necessarily very momentous) decision – perhaps one that we FCP.

Thoughts? Am I over-reacting?

UPDATE: @retep998 opened this issue describing the particular problem rayon hit. To be clear, though, I’m not saying that this problem itself is necessarily a problem, but it does seem to me like the set of dependencies is growing sort of rapidly. Not sure if this is necessarily a problem though.

6 Likes

The “dependency allowed to exist” part of this recent post seems relevant. At least in the sense that there is certainly a use-case for this level of control.

compiletest_rs is just a dev-dependency of rayon, and just intuitively, I don’t think it should be pulled in, in any way.

Thoughts? Am I over-reacting?

I don't think so. Every dependency might cause work in the future.

Trying to keep dependencies low certainly is a good idea, but perhaps even more important to ensure that they're maintained.

1 Like

yes, it’s a dev-dependency. The problem we were encountering was travis failures. We use it to check that things are failing to compile when they should. I’m interested in other solutions to that problem, but that’s kind of a separate question.

Mostly what I’m trying to ask here is whether we should try to at least throw up some roadblock to adding new dependencies, so that we have a chance to look at them as they go in. I think we don’t have to be sticklers about it, but I suspect many of them are kind of “accidental”.

In this case, for example, miri depends on backtrace, but that’s not really needed from within the compiler; I think it’s used for prettier errors. We could have potentially made it an optional feature.

(Should we have? Not sure.)

I would personally be a fan of trying out a whitelist for compiler dependencies (false positives through cargo/rustfmt/rls/etc shouldn’t affect this). The compiler dependencies end up being much trickier in practice I think than rls/cargo/etc because we continue to use them to generate programs, whereas cargo/rls are just standalone programs.

I’m also starting to get worried about our build times for the compiler itself. Whenever we put more dependencies in the compiler those are crates we have to build and it can take longer and longer to do that. I’ve just done some analysis which shows that we’re clearly regressing on how long it takes to bootstrap rustc over time, it’s just not clear to me what’s actually causing the problem. I don’t think pulling in tons of crates from crates.io help much, however.

Could we perhaps try an explicit whitelist for awhile of allowed crates? See how it goes? If it’s too onerous we can reconsider and otherwise I’d agree that having a double check for “are you sure you wanted to add this dependency?” would be a good idea.

5 Likes

I'm in favor of trying it out. Is it hard to do? =) I'd hate to add more onto your plate personally, I'm not really following who hacks around the build system though.

Hm it won’t be too hard but I think it will be nontrivial.

We’ve already got dependency checking which verifies license, so it should be mostly just adding an array next to EXCEPTIONS for a whitelist (and then checking the names of directories and such). The trick is that we only want to verify the dependencies of rustc itself, so we’d have to run cargo metadata to learn about the dependency graph. The build system already does that though so the logic could perhaps just be moved there.

Coming back to this: I think this is a huge security hole in the style of Reflections on Trusting Trust.

Would you be up for adding a whitelist, @mark-i-m? @alexcrichton laid out some directions here.

I guess it’s more of a question of vetting. I don’t think a whitelist is sufficient. Before we upgrade any dependency we would want to make sure nothing fishy was added. Moreover, we don’t want to transitively trust huge numbers of crates, so we would need to then vet dependencies of dependencies, and so on (i.e. the whole DAG)…

That sounds like a huge effort, but it does come with a lot of benefits:

  • we know exactly what we depend on and what we don’t
  • we might get ideas for where to trim some of the excess weight
  • it incentivizes not adding dependencies -> faster builds, smaller binaries

I’m happy to try to start this, but my bandwidth is a bit low ATM, so it might be kind of slow…

4 Likes

With respect, that sounds like a whitelist. =) Yes, the whitelist is not just the name of the crate, but the revision. And yes, it is transitive.

1 Like

Well, I guess – there is some debate about how to set it up :slight_smile:

But at minimum it does seem like it should include the revision

1 Like

Wow… we have a lot of dependencies:

* advapi32-sys 0.2.0
* aho-corasick 0.5.3
* aho-corasick 0.6.4
* alloc 0.0.0
* alloc_jemalloc 0.0.0
* alloc_system 0.0.0
* ansi_term 0.10.2
* ar 0.3.1
* arena 0.0.0
* atty 0.2.6
* backtrace 0.3.5
* backtrace-sys 0.1.16
* bin_lib 0.1.0
* bitflags 0.7.0
* bitflags 0.9.1
* bitflags 1.0.1
* bootstrap 0.0.0
* borrow_error 0.1.0
* bufstream 0.1.3
* build-manifest 0.1.0
* build_helper 0.1.0
* byteorder 1.2.1
* cargo 0.26.0
* cargo_metadata 0.2.3
* cargo_metadata 0.4.0
* cargotest 0.1.0
* cargotest2 0.1.0
* cc 1.0.4
* cfg-if 0.1.2
* chrono 0.4.0
* clap 2.29.0
* clippy 0.0.186
* clippy-mini-macro-test 0.2.0
* clippy_lints 0.0.186
* cmake 0.1.29
* coco 0.1.1
* commoncrypto 0.2.0
* commoncrypto-sys 0.2.0
* compiler_builtins 0.0.0
* compiletest 0.0.0
* compiletest_rs 0.3.6
* completion 0.1.0
* core 0.0.0
* core-foundation 0.4.6
* core-foundation-sys 0.4.6
* crates-io 0.15.0
* crossbeam 0.2.12
* crossbeam 0.3.2
* crypto-hash 0.3.0
* curl 0.4.11
* curl-sys 0.4.1
* deglob 0.1.0
* derive-new 0.5.0
* diff 0.1.11
* dlmalloc 0.0.0
* docopt 0.8.3
* dtoa 0.4.2
* duct 0.8.2
* either 1.4.0
* endian-type 0.1.2
* enum_primitive 0.1.1
* env_logger 0.3.5
* env_logger 0.4.3
* env_logger 0.5.3
* error-chain 0.11.0
* error-chain 0.8.1
* error_index_generator 0.0.0
* failure 0.1.1
* failure_derive 0.1.1
* features 0.1.0
* filetime 0.1.15
* find_all_refs_no_cfg_test 0.1.0
* find_impls 0.1.0
* flate2 1.0.1
* fmt_macros 0.0.0
* fnv 1.0.6
* foreign-types 0.3.2
* foreign-types-shared 0.1.1
* fs2 0.4.3
* fuchsia-zircon 0.3.3
* fuchsia-zircon-sys 0.3.3
* futures 0.1.17
* getopts 0.2.15
* git2 0.6.11
* git2-curl 0.7.0
* glob 0.2.11
* globset 0.2.1
* graphviz 0.0.0
* hamcrest 0.1.1
* handlebars 0.29.1
* hex 0.2.0
* hex 0.3.1
* home 0.3.0
* idna 0.1.4
* if_chain 0.1.2
* ignore 0.3.1
* infer_bin 0.1.0
* infer_custom_bin 0.1.0
* infer_lib 0.1.0
* installer 0.0.0
* is-match 0.1.0
* itertools 0.6.5
* itertools 0.7.6
* itoa 0.3.4
* jobserver 0.1.9
* json 0.11.12
* jsonrpc-core 8.0.1
* kernel32-sys 0.2.2
* languageserver-types 0.30.0
* lazy_static 0.2.11
* lazy_static 1.0.0
* lazycell 0.5.1
* libc 0.0.0
* libc 0.2.36
* libgit2-sys 0.6.19
* libssh2-sys 0.2.6
* libz-sys 1.0.18
* linkchecker 0.1.0
* log 0.3.9
* log 0.4.1
* log_settings 0.1.1
* lzma-sys 0.1.9
* matches 0.1.6
* mdbook 0.1.2
* memchr 0.1.11
* memchr 2.0.1
* miniz-sys 0.1.10
* miow 0.2.1
* miri 0.1.0
* multiple_bins 0.1.0
* net2 0.2.31
* nibble_vec 0.0.3
* nix 0.8.1
* num 0.1.41
* num-bigint 0.1.41
* num-complex 0.1.41
* num-integer 0.1.35
* num-iter 0.1.34
* num-rational 0.1.40
* num-traits 0.1.41
* num_cpus 1.8.0
* open 1.2.1
* openssl 0.9.23
* openssl-probe 0.1.2
* openssl-sys 0.9.24
* os_pipe 0.5.1
* owning_ref 0.3.3
* panic_abort 0.0.0
* panic_unwind 0.0.0
* parking_lot 0.5.3
* parking_lot_core 0.2.9
* percent-encoding 1.0.1
* pest 0.3.3
* pkg-config 0.3.9
* proc_macro 0.0.0
* profiler_builtins 0.0.0
* pulldown-cmark 0.0.15
* pulldown-cmark 0.1.0
* quick-error 1.2.1
* quine-mc_cluskey 0.2.4
* quote 0.3.15
* racer 2.0.12
* radix_trie 0.1.2
* rand 0.3.20
* rayon 0.9.0
* rayon-core 1.3.0
* redox_syscall 0.1.37
* redox_termios 0.1.1
* reformat 0.1.0
* reformat_with_range 0.1.0
* regex 0.1.80
* regex 0.2.5
* regex-syntax 0.3.9
* regex-syntax 0.4.2
* remote-test-client 0.1.0
* remote-test-server 0.1.0
* rls 0.125.0
* rls-analysis 0.11.0
* rls-blacklist 0.1.0
* rls-data 0.15.0
* rls-rustc 0.2.1
* rls-span 0.4.0
* rls-vfs 0.4.4
* rustbook 0.1.0
* rustc 0.0.0
* rustc-ap-rustc_cratesio_shim 29.0.0
* rustc-ap-rustc_data_structures 29.0.0
* rustc-ap-rustc_errors 29.0.0
* rustc-ap-serialize 29.0.0
* rustc-ap-syntax 29.0.0
* rustc-ap-syntax_pos 29.0.0
* rustc-demangle 0.1.5
* rustc-main 0.0.0
* rustc-serialize 0.3.24
* rustc_allocator 0.0.0
* rustc_apfloat 0.0.0
* rustc_asan 0.0.0
* rustc_back 0.0.0
* rustc_binaryen 0.0.0
* rustc_borrowck 0.0.0
* rustc_const_eval 0.0.0
* rustc_const_math 0.0.0
* rustc_cratesio_shim 0.0.0
* rustc_data_structures 0.0.0
* rustc_driver 0.0.0
* rustc_errors 0.0.0
* rustc_incremental 0.0.0
* rustc_lint 0.0.0
* rustc_llvm 0.0.0
* rustc_lsan 0.0.0
* rustc_metadata 0.0.0
* rustc_mir 0.0.0
* rustc_msan 0.0.0
* rustc_passes 0.0.0
* rustc_platform_intrinsics 0.0.0
* rustc_plugin 0.0.0
* rustc_privacy 0.0.0
* rustc_resolve 0.0.0
* rustc_save_analysis 0.0.0
* rustc_trans 0.0.0
* rustc_trans_utils 0.0.0
* rustc_tsan 0.0.0
* rustc_typeck 0.0.0
* rustdoc 0.0.0
* rustdoc-themes 0.1.0
* rustdoc-tool 0.0.0
* rustfmt-nightly 0.3.8
* same-file 0.1.3
* same-file 1.0.2
* schannel 0.1.10
* scoped-tls 0.1.0
* scopeguard 0.1.2
* scopeguard 0.3.3
* semver 0.6.0
* semver 0.8.0
* semver 0.9.0
* semver-parser 0.7.0
* serde 1.0.27
* serde_derive 1.0.27
* serde_derive_internals 0.19.0
* serde_ignored 0.0.4
* serde_json 1.0.9
* serialize 0.0.0
* shared_child 0.2.1
* shell-escape 0.1.3
* shlex 0.1.1
* smallvec 0.6.0
* socket2 0.3.0
* stable_deref_trait 1.0.0
* std 0.0.0
* std_unicode 0.0.0
* strsim 0.6.0
* syn 0.11.11
* synom 0.11.3
* synstructure 0.6.1
* syntax 0.0.0
* syntax_ext 0.0.0
* syntax_pos 0.0.0
* syntex_errors 0.52.0
* syntex_pos 0.52.0
* syntex_syntax 0.52.0
* tar 0.4.14
* tempdir 0.3.5
* term 0.0.0
* term 0.4.6
* termcolor 0.3.3
* termion 1.5.1
* test 0.0.0
* textwrap 0.9.0
* thread-id 2.0.0
* thread_local 0.2.7
* thread_local 0.3.5
* tidy 0.1.0
* time 0.1.39
* toml 0.2.1
* toml 0.4.5
* toml-query 0.6.0
* unicode-bidi 0.3.4
* unicode-normalization 0.1.5
* unicode-segmentation 1.2.0
* unicode-width 0.1.4
* unicode-xid 0.0.3
* unicode-xid 0.0.4
* unreachable 1.0.0
* unstable-book-gen 0.1.0
* unwind 0.0.0
* url 1.6.0
* url_serde 0.2.0
* userenv-sys 0.2.0
* utf8-ranges 0.1.3
* utf8-ranges 1.0.0
* vcpkg 0.2.2
* vec_map 0.8.0
* void 1.0.2
* walkdir 1.0.7
* walkdir 2.0.1
* winapi 0.2.8
* winapi 0.3.4
* winapi-build 0.1.1
* winapi-i686-pc-windows-gnu 0.4.0
* winapi-x86_64-pc-windows-gnu 0.4.0
* wincolor 0.1.4
* workspace_symbol 0.1.0
* ws2_32-sys 0.2.1
* xattr 0.1.11
* xz2 0.1.4
* yaml-rust 0.3.5
1 Like

Are we trying to control that whole list, or just those that are specifically built into rustc?

That’s the full list of dependencies from cargo metadata --manifest-path rust/src/Cargo.toml…

Yeah, I figured it was something like that, but that includes dependencies of every in-tree tool too. For instance, xz2 is only used by the installer to generate tarballs; it’s not in anything we ship to users.

Maybe we do want to track all that fully – that’s what I’m asking.

2 Likes

Hmm… that’s a great question. @nikomatsakis @alexcrichton ?

This does raise an interesting question: is xz2 actually ok to use in the compiler? The whitelist prototype based on cargo metadata can’t differentiate…

You can use “resolve” field of metadata to lerarn about precise dependencies of each package