Submodules in rust-lang/rust for external repositories?

alexcrichton · June 4, 2015, 8:43pm

When we initially moved crates into the rust-lang organization from the rust repository we ended up with some duplication. For example the in-tree libterm is in theory the same as rust-lang/term, except that rust-lang/term has drifted over time to be different than libterm in-tree. This drift is problematic, however, and can cause confusion. Additionally, it’s somewhat odd having the same code in two different locations.

Now that Rust and the standard library is stable, there should in theory be 0 breaking changes sweeping through the compiler to update old code. This is why we have avoided submodules in the past as landing changes in the submodule, then landing the submodule update, is a difficult process. With a stable release, however, the submodules should be guaranteed to compile until the end of time, perhaps with some minor elaboration along the way.

I would like to propose that we start using submodules for the following crates:

term
serialize (rustc-serialize)
getopts
log - also add env_logger
libc - new apis are added here somewhat regularly, but they are rarely used in the compiler and standard library. The standard library has its own area for defining new APIs.

Other candidates are the following, along with reasons to not pursue them at this time:

flate - this crate is not in rust-lang, and the “replacement” of flate2 also has a dependency on miniz_sys, and it’s a little bit overkill to be including these two for something so minor
rand - this dependency is underneath the standard library, and the crate in rust-lang/rand is not configured to have such placement (it depends on the standard library)

Having this infrastructure and precedent will allow us to perhaps easily add in new dependencies in the future, such as pulldown-cmark (a markdown parser written in Rust). All submodules would be required to use stable Rust (and probably have no dependencies of their own for the time being).

What do others think about this strategy? Perhaps it’s not worth it?

aturon · June 4, 2015, 8:45pm

Yes please. Submodules can be a bit of a pain, but duplication and drift is even worse.

Gankra · June 4, 2015, 8:47pm

Yes I’d like to be able to delete a lot of the legacy code we’re accumulating. In particular rustc uses lots of collections that could reasonably be maintained out-of-tree, but aren’t necessarily worth exposing in std.

brson · June 4, 2015, 8:47pm

OK.

20 chars 20 chars

nrc · June 4, 2015, 8:57pm

Yes please, I would also like to do this

retep998 · June 4, 2015, 8:58pm

This could also mean using winapi for WinAPI instead of having it scattered throughout libc and std::sys::windows::c.

jroesch · June 4, 2015, 10:06pm

I agree I think submodules are much better then duplication and drift.

alexcrichton · June 5, 2015, 12:27am

Ah I just remembered a few points, so I wanted to put some clarifications as well to make sure we’re all on the same page. Primarily, for each external dependency, I wouldn’t expect the rust to run tests, build documentation, or run document tests. Much of this is managed by these repos (through Cargo), and I don’t want to reencode all of Cargo’s logic into our own build system.

The second part I remembered is that the more dependencies we add on git submodules the more likely it is that over time you won’t be able to build old revisions of Rust. We try to be quite diligent with the rust-lang/llvm submodule by making sure that every branch always exists, but once we add a lot of upstream git dependencies there’s a high likelihood that historic git hashes disappear, git repos go away altogether, etc. Some possibilities to mitigate this are:

We could download all crates from crates.io instead
Mirrors could be made into the rust-lang organization specifically for hosting submodules of the rust repository
We could stop development in-tree, but copy in the source from the external crate and add a stern warning that none of the files should be modified without just pulling in an upstream version.

I’m somewhat worried about this aspect, but not overly so as it would indeed be quite nice to start picking up some more flavorful projects outside of the compiler to make the compiler itself.

stebalien · June 5, 2015, 2:28am

@alexcrichton Why not use git subtrees?

alexcrichton · June 5, 2015, 2:53am

Whoa, I had never even heard of subtrees before! This sounds… perfect

Thanks for the pointer @stebalien!

Manishearth · June 5, 2015, 7:30am

Careful, subtrees can be pretty treacherous (like submodules). But they might just work.

Big +1 for this.

dan_t · June 5, 2015, 7:50am

There’s also subrepo[1], which has a lot in common with subtrees but a somehow nicer user interface.

[1] https://github.com/ingydotnet/git-subrepo/blob/master/Intro.pod#git-subtrees

kstep · June 5, 2015, 12:31pm

The problem with subtrees (and the reason I don’t like them as much as submodules), is they don’t store meta-info about source repos in original repo. So you have to put phrases like "after cloning, if you want to work with subtreed projects, do “git remote add other-repo-name git@github.com:some/other-repo.git”, while in case of submodules you already have this meta-info after cloning in .gitmodules file.

dan_t · June 5, 2015, 1:04pm

That’s one of the key things that does subrepo[1] better than subtrees.

[1] https://github.com/ingydotnet/git-subrepo

kstep · June 5, 2015, 1:24pm

Yes, I’ve just read about them, thank you for the link! It looks very promising update after subtrees.

alexcrichton · June 5, 2015, 6:06pm

Oh thanks for the pointer! I'm somewhat worried by the state of the README though:

I would not use it for important things yet, but playing around with it is cheap (this is not git submodule), and not permanent (if you do not push to public remotes).

It's sounding to me like git subtrees is the best strategy forward here as we're very rarely going to need to update these, and it's definitely not a regular piece of the codebase which needs to be updated. I think the key aspect is that these libraries will not be actively developed in the rust-lang/rust repo, only periodically updated to their upstream equivalents as necessary.

dan_t · June 5, 2015, 7:28pm

I don’t know how up to date the README is, because then he writes about git-subrepo:

Bad:
   --Subrepo is very new.-- (no longer true)
   --Not well tested in the wild.-- (no longer true)

At the end both, subtree and subrepo, will most likely just use the subtree merge strategey of git merge, which does the major work of both commands.

Just not using submodules is already a good solution.

alexcrichton · June 5, 2015, 10:19pm

I’ve now created a PR to implement this change for the first few crates!

ahmedcharles · June 8, 2015, 9:55am

Just replying to give a +1.

Nashenas88 · June 9, 2015, 5:46pm

Not to mention that it has the potential to be dangerous too. Security bug fixed in one, but not the other...

Topic		Replies	Views
Rust CI and submodule crates	23	4088	March 25, 2019
Idea: Light-weight reusable dependencies tools and infrastructure	13	1221	October 30, 2022
Module, SubModule, subdirs, etc language design	6	1195	June 10, 2023
Pre-RFC: Translation sub-repository for Rust docs tools and infrastructure	0	151	October 8, 2024
Feature Request: Import outer crate as mix-in language design	6	879	October 5, 2021

Submodules in rust-lang/rust for external repositories?

Related topics