When we initially moved crates into the rust-lang organization from the rust repository we ended up with some duplication. For example the in-tree libterm is in theory the same as rust-lang/term, except that rust-lang/term has drifted over time to be different than libterm in-tree. This drift is problematic, however, and can cause confusion. Additionally, it’s somewhat odd having the same code in two different locations.
Now that Rust and the standard library is stable, there should in theory be 0 breaking changes sweeping through the compiler to update old code. This is why we have avoided submodules in the past as landing changes in the submodule, then landing the submodule update, is a difficult process. With a stable release, however, the submodules should be guaranteed to compile until the end of time, perhaps with some minor elaboration along the way.
I would like to propose that we start using submodules for the following crates:
term
serialize (rustc-serialize)
getopts
log - also add env_logger
libc - new apis are added here somewhat regularly, but they are rarely used in the compiler and standard library. The standard library has its own area for defining new APIs.
Other candidates are the following, along with reasons to not pursue them at this time:
flate - this crate is not in rust-lang, and the “replacement” of flate2 also has a dependency on miniz_sys, and it’s a little bit overkill to be including these two for something so minor
rand - this dependency is underneath the standard library, and the crate in rust-lang/rand is not configured to have such placement (it depends on the standard library)
Having this infrastructure and precedent will allow us to perhaps easily add in new dependencies in the future, such as pulldown-cmark (a markdown parser written in Rust). All submodules would be required to use stable Rust (and probably have no dependencies of their own for the time being).
What do others think about this strategy? Perhaps it’s not worth it?
Yes I’d like to be able to delete a lot of the legacy code we’re accumulating. In particular rustc uses lots of collections that could reasonably be maintained out-of-tree, but aren’t necessarily worth exposing in std.
Ah I just remembered a few points, so I wanted to put some clarifications as well to make sure we’re all on the same page. Primarily, for each external dependency, I wouldn’t expect the rust to run tests, build documentation, or run document tests. Much of this is managed by these repos (through Cargo), and I don’t want to reencode all of Cargo’s logic into our own build system.
The second part I remembered is that the more dependencies we add on git submodules the more likely it is that over time you won’t be able to build old revisions of Rust. We try to be quite diligent with the rust-lang/llvm submodule by making sure that every branch always exists, but once we add a lot of upstream git dependencies there’s a high likelihood that historic git hashes disappear, git repos go away altogether, etc. Some possibilities to mitigate this are:
We could download all crates from crates.io instead
Mirrors could be made into the rust-lang organization specifically for hosting submodules of the rust repository
We could stop development in-tree, but copy in the source from the external crate and add a stern warning that none of the files should be modified without just pulling in an upstream version.
I’m somewhat worried about this aspect, but not overly so as it would indeed be quite nice to start picking up some more flavorful projects outside of the compiler to make the compiler itself.
The problem with subtrees (and the reason I don’t like them as much as submodules), is they don’t store meta-info about source repos in original repo. So you have to put phrases like "after cloning, if you want to work with subtreed projects, do “git remote add other-repo-name git@github.com:some/other-repo.git”, while in case of submodules you already have this meta-info after cloning in .gitmodules file.
Oh thanks for the pointer! I'm somewhat worried by the state of the README though:
I would not use it for important things yet, but playing around with it is cheap (this is not git submodule), and not permanent (if you do not push to public remotes).
It's sounding to me like git subtrees is the best strategy forward here as we're very rarely going to need to update these, and it's definitely not a regular piece of the codebase which needs to be updated. I think the key aspect is that these libraries will not be actively developed in the rust-lang/rust repo, only periodically updated to their upstream equivalents as necessary.