Submodules in rust-lang/rust for external repositories?

When we initially moved crates into the rust-lang organization from the rust repository we ended up with some duplication. For example the in-tree libterm is in theory the same as rust-lang/term, except that rust-lang/term has drifted over time to be different than libterm in-tree. This drift is problematic, however, and can cause confusion. Additionally, it’s somewhat odd having the same code in two different locations.

Now that Rust and the standard library is stable, there should in theory be 0 breaking changes sweeping through the compiler to update old code. This is why we have avoided submodules in the past as landing changes in the submodule, then landing the submodule update, is a difficult process. With a stable release, however, the submodules should be guaranteed to compile until the end of time, perhaps with some minor elaboration along the way.

I would like to propose that we start using submodules for the following crates:

  • term
  • serialize (rustc-serialize)
  • getopts
  • log - also add env_logger
  • libc - new apis are added here somewhat regularly, but they are rarely used in the compiler and standard library. The standard library has its own area for defining new APIs.

Other candidates are the following, along with reasons to not pursue them at this time:

  • flate - this crate is not in rust-lang, and the “replacement” of flate2 also has a dependency on miniz_sys, and it’s a little bit overkill to be including these two for something so minor
  • rand - this dependency is underneath the standard library, and the crate in rust-lang/rand is not configured to have such placement (it depends on the standard library)

Having this infrastructure and precedent will allow us to perhaps easily add in new dependencies in the future, such as pulldown-cmark (a markdown parser written in Rust). All submodules would be required to use stable Rust (and probably have no dependencies of their own for the time being).

What do others think about this strategy? Perhaps it’s not worth it?

Yes please. Submodules can be a bit of a pain, but duplication and drift is even worse.

Yes I’d like to be able to delete a lot of the legacy code we’re accumulating. In particular rustc uses lots of collections that could reasonably be maintained out-of-tree, but aren’t necessarily worth exposing in std.

OK.

20 chars 20 chars

Yes please, I would also like to do this

This could also mean using winapi for WinAPI instead of having it scattered throughout libc and std::sys::windows::c. :smiley:

2 Likes

I agree I think submodules are much better then duplication and drift. :+1:

Ah I just remembered a few points, so I wanted to put some clarifications as well to make sure we’re all on the same page. Primarily, for each external dependency, I wouldn’t expect the rust to run tests, build documentation, or run document tests. Much of this is managed by these repos (through Cargo), and I don’t want to reencode all of Cargo’s logic into our own build system.

The second part I remembered is that the more dependencies we add on git submodules the more likely it is that over time you won’t be able to build old revisions of Rust. We try to be quite diligent with the rust-lang/llvm submodule by making sure that every branch always exists, but once we add a lot of upstream git dependencies there’s a high likelihood that historic git hashes disappear, git repos go away altogether, etc. Some possibilities to mitigate this are:

  • We could download all crates from crates.io instead
  • Mirrors could be made into the rust-lang organization specifically for hosting submodules of the rust repository
  • We could stop development in-tree, but copy in the source from the external crate and add a stern warning that none of the files should be modified without just pulling in an upstream version.

I’m somewhat worried about this aspect, but not overly so as it would indeed be quite nice to start picking up some more flavorful projects outside of the compiler to make the compiler itself.

@alexcrichton Why not use git subtrees?

1 Like

Whoa, I had never even heard of subtrees before! This sounds… perfect :slight_smile:

Thanks for the pointer @stebalien!

Careful, subtrees can be pretty treacherous (like submodules). But they might just work.

Big +1 for this.

1 Like

There’s also subrepo[1], which has a lot in common with subtrees but a somehow nicer user interface.

[1] https://github.com/ingydotnet/git-subrepo/blob/master/Intro.pod#git-subtrees

2 Likes

The problem with subtrees (and the reason I don’t like them as much as submodules), is they don’t store meta-info about source repos in original repo. So you have to put phrases like "after cloning, if you want to work with subtreed projects, do “git remote add other-repo-name git@github.com:some/other-repo.git”, while in case of submodules you already have this meta-info after cloning in .gitmodules file.

That’s one of the key things that does subrepo[1] better than subtrees.

[1] https://github.com/ingydotnet/git-subrepo

Yes, I’ve just read about them, thank you for the link! It looks very promising update after subtrees.

Oh thanks for the pointer! I'm somewhat worried by the state of the README though:

I would not use it for important things yet, but playing around with it is cheap (this is not git submodule), and not permanent (if you do not push to public remotes).

It's sounding to me like git subtrees is the best strategy forward here as we're very rarely going to need to update these, and it's definitely not a regular piece of the codebase which needs to be updated. I think the key aspect is that these libraries will not be actively developed in the rust-lang/rust repo, only periodically updated to their upstream equivalents as necessary.

I don’t know how up to date the README is, because then he writes about git-subrepo:

Bad:
   --Subrepo is very new.-- (no longer true)
   --Not well tested in the wild.-- (no longer true)

At the end both, subtree and subrepo, will most likely just use the subtree merge strategey of git merge, which does the major work of both commands.

Just not using submodules is already a good solution. :slight_smile:

I’ve now created a PR to implement this change for the first few crates!

2 Likes

Just replying to give a +1.

Not to mention that it has the potential to be dangerous too. Security bug fixed in one, but not the other...