Docs.rs build prioritization

There has (and has been for at least 12 hours) been a 6+ hour build wait on docs.rs courtesy of an unknown user and a publicly empty repository.

In my opinion we should consider implementing prioritization on docs.rs. It should probably include the popularity of the crate (recent downloads and/or stars) and how many crates the user has published in a reference time period (perhaps in excess of a threshold). This would prioritize well-known crates where there's only one or a few crates updated, while at the same time forcing crates like the one linked to be the cause of their own wait.

I don't have a particular formula in mind. Am I overreacting to this? It seems sensible in any circumstance: ensure the most popular things are up to date first without harming smaller one-off publishes.

5 Likes

While there’s no code on github, obviously the stuff uploaded to crates.io is proper rust code; quoting “an empty repository” seems slightly misleading.

I’d disagree with giving priority to “popular” things. That seems like introducing huge potential of starvation of not “popular” crates. I’d also disagree to use only the number of crates as a reference, since documenting crates can take vastly different amounts of time, and I wouldn’t want to stop anyone from getting 5 or 10 fast-compiling (few-dependencies) crates documented quickly.

Perhaps something like recent build-time per owner could be considered; when that reaches a threash-hold. The maximum build time per crate is 15 minutes, but in my opinion it’s also reasonable to e.g. push all the pending new publishes from the same owner back to the end of the queue if a certain time limit is exceeded, for example 15 minutes as well1 or 30 minutes, etc.. (which would mean that a single user could add at most 30 minutes or 45 minutes, etc… of delay/latency to the build queue, respectively).

1 with 15 minutes, you could still always get at least 2 crates documented in a row because the first crate cannot surpass the 15 minutes yet; typically more than that, because documenting crates should take considerately less than 15 minutes.

Another (less important) thing that could make sense IMO is to give priority to initial releases. It’s annoying when you’re setting up a crate that the docs.rs/crate-name… link doesn’t work at all for quite some time. Presumably within limits, e.g. give priority to at most 1 initial release per owner per day, always move all the releases with priority to the front of the queue.

3 Likes

My thought wasn't to have popular crates always prioritized, just with all else being equal. Having a build-time-per-user restriction is probably the most accurate, though I'd still like to take popularity into account to an extent. Initial releases is also reasonable to prioritize.

Mainly I was looking to see if this idea is even sensible. There would obviously be a ton of discussion about how to prioritize things.

I'd also consider any release that is semver-incompatible with the previous latest release (if greater than it) to have some priority. Docs for 1.3.0 are "more important" than 1.2.3 since the 1.2.2 docs are (supposed to be) by definition sufficient at the API level.

6 Likes

How are the jobs executed on docs.rs? I'm completely unfamiliar with its build system, so I don't know how one empty repository is blocking everything...

As mentioned, it’s not an empty repository blocking stuff, instead it’s just one user with a bunch of (apparently related) crates where the source code does not (appear to) be posted in a repository. (At least there’s no repo linked on crates.io.

Take a look for crates starting with “surge…” on the firstfewpages of recently released crates as well as still in the queue. I count about 51 crates recently built and 42 still in the queue; there seem to be two releases “0.1.30-alpha.0” and “0.1.29-alpha.0”.

:confused: How'd he pull that off??? I thought that docs.rs had to pull from the repo to build it!

Got it, I misinterpreted what @jhpratt was saying. Is it a bad bot that went wild? Maybe prioritization should include how many crates a given user already has in the queue. Basically, the nice-ness level is the current number of crates you have in the queue, maxed out at 19. That should slow this behavior down to something reasonable.

Crates.io doesn't get code from git repositories. Crates don't have to have any repository at all. cargo publish uploads source code directly from a local directory to crates.io.

3 Likes

I suggest implementing this at crate owner level. Build crates from different owners, so that each crate owner gets a turn.

e.g. after building a crate (or two) from owner X, move all other crates owned by X to the end of the queue. This will end up doing round-robin over crate owners.

5 Likes

That is unfavorable to owners who publish a large number of fast-building crates.

Why not track the total build time spent by each owner, and have a priority queue sorting owners by total-build-time-spent-so-far? That way, build time is equally distributed over owners, but owners who need less build time than average get visited the most promptly.

I probably would also include some factor of prioritization for popularity, because docs are a service to the users of the crate and not just the authors

2 Likes

The website already has a priority feature and I have seen it used for large sets of related crates. This crate is currently in the queue

That's actually what triggered my idea. I hadn't realized docs.rs had such capabilities until I saw something was already deprioritized.

Docs.rs already has support for deprioritizing large projects that could impact the build queue! The docs.rs team monitors the queue, and when we notice a big project causing disruptions we set the default priority for those crates to be -1.

The current way projects are added to the "deprioritized list" is manual: the docs.rs team gets alerts when the build queue gets too long, and if we determine the project is using too many resources we add the relevant rules to the list.

We're using this approach rather than an algorithm to automatically deprioritize to avoid false positives. Up until now the approach worked well, and the problem this time was the docs.rs team missing the alert. Unless this starts happening consistently I'm not sure this warrants a ton of design work around automation.

We're also planning some infrastructure upgrades to better handle the increase in the number of publishes: right now it's kinda hard to scale due to some architectural decisions, but we have a workable plan to be able to scale in the coming months.

13 Likes

Is it possible to get some real data from docs.rs (submission time, latency, build duration, etc.)? If so, this could be used as input to simulate how a proposed prioritization algorithm would behave and gather the same stats from such simulations to compare them. I feel like I've seen enough of these kinds of situations where actually running a simulation of the model tells you more than trying to compare abstract prose descriptions quantitatively (because of the second-order effects that tend to happen).

3 Likes

Me and @jyn514 were actually talking about an automatic de-prioritization strategy the other day and opened an issue for it

OK, I looked at the documentation for docs.rs, and it looks like each crate is built within its own docker image. So this actually gives us a really simple procedure for dealing with the issue that allows us to reweight the relative performance of each build on the fly:

  1. On start up, use docker update -c to set the CPU share to 8. (see this article for an explanation of the CPU share option).
  2. For every 15 minutes of runtime of a given container, decrease the container's CPU share by 1, to a minimum of 1.
  3. There is no step 3.

Basically, this is kind of like using renice, but on the container as a whole, which means that resources are idle only if there's no work to do.

The discussed monopolization isn't per crate (per docker container); that's already capped to 15 minutes (modulo special exemptions). The problem at hand is one publisher publishing multiple crates, the combination of which monopolizes the queue.

I'm aware of the issue, and I'm also aware of how my original understanding was flawed (see the issue on GitHub). I originally thought that there were multiple containers running concurrently, one for each crate being documented, but that isn't the case; there is currently one container running at a time.

That said, if and only if the docs team thinks it's worth the effort, we can combine approaches to solve the various issues:

  1. All crates that are owned by the same entity are built within the same container. Within a given container crates can be built either concurrently or serially, it doesn't really matter which.
  2. Containers are ephemeral; when the last build process completes, the container is disposed of in the same manner that they are disposed of currently (I don't know how that's currently handled, which is why I'm hand waving it away). That said, if a given owner is manually pushing crates (so that there is reasonably large period of time between publications to crates.io), and their associated container is still alive, docs.rs will push the build request to their currently living container and not spin up a new one.
  3. Containers are executed concurrently, and have their relative CPU shares adjusted according the algorithm I gave above.

The trick is that if a given owner publishes multiple crates at once, and docs.rs is running a container for that owner already, then the build request is passed to the currently running container to process. Since the container isn't terminated until after the last build process completes, it will get progressively less CPU to execute, which will slow a given owner's crates down without affecting other owners. Once the container has executed all build processes, it can be allowed to die naturally. If the owner publishes another crate after their container shutdown, then a new container is spun up, with the default CPU share. That way no one is permanently punished, but if you are taking up resources (either because you have a lot of small crates, or one gargantuan one), you only affect your own crates, rather than everyone else's.

Note that the outlined method won't work as docs.rs is currently set up. Since it only executes a single container at a time, adjusting the relative CPU share is meaningless. So before this method could be applied, a great deal of work would need to be done to change the entire architecture. And as @pietroalbini has already said in the other thread, this is pretty much a non-issue for the docs team. In short, I'm 99.99% sure that we are now beating a dead horse :wink:

Tangentially, what about letting the option for some of the crates.io uploaders to offload the generation of the docs to Github Actions and its own "page"? I don't have precise ideas, but roughly, instead of the docs.rs-generated docs at docs.rs/that-crate/…/, we'd have a proxy page that would say:

this crate has1 opted out of docs.rs rendered docs, by hosting it on their own external site: https://someusername.github.io/that-crate/…/

  • 1 perhaps just temporarily: docs.rs would nonetheless be allowed to, when idle, perform an actual generation of the docs and replace the redirection with that.

  • with the link warning about an external redirect, and having an option, through cookies, to register that you trust specified hosts so as to skip, on your local machine, that redirect warning and allow it to be performed automatically (potentially with a docs.rs banner at the top remininding that the site is external?)

  • In other words, I may be too naïve w.r.t. ways this idea could be exploited by evil users (although arbitrary js "injection" in docs.rs-generated docs is trivial, so I don't think that a warned external-link redirect would be that much worse); but the core idea is that people such as myself, and I imagine, many other rustaceans, wouldn't mind setting up a GH action that would render the docs somewhere, and then tell docs.rs that rendering their page is not that urgent or important / can be skipped.

Disabling docs.rs and having it redirect to another place has been brought up before. It's controversial, at best.