build prioritization

My thought wasn't to have popular crates always prioritized, just with all else being equal. Having a build-time-per-user restriction is probably the most accurate, though I'd still like to take popularity into account to an extent. Initial releases is also reasonable to prioritize.

Mainly I was looking to see if this idea is even sensible. There would obviously be a ton of discussion about how to prioritize things.

I'd also consider any release that is semver-incompatible with the previous latest release (if greater than it) to have some priority. Docs for 1.3.0 are "more important" than 1.2.3 since the 1.2.2 docs are (supposed to be) by definition sufficient at the API level.


How are the jobs executed on I'm completely unfamiliar with its build system, so I don't know how one empty repository is blocking everything...

As mentioned, it’s not an empty repository blocking stuff, instead it’s just one user with a bunch of (apparently related) crates where the source code does not (appear to) be posted in a repository. (At least there’s no repo linked on

Take a look for crates starting with “surge…” on the firstfewpages of recently released crates as well as still in the queue. I count about 51 crates recently built and 42 still in the queue; there seem to be two releases “0.1.30-alpha.0” and “0.1.29-alpha.0”.

:confused: How'd he pull that off??? I thought that had to pull from the repo to build it!

Got it, I misinterpreted what @jhpratt was saying. Is it a bad bot that went wild? Maybe prioritization should include how many crates a given user already has in the queue. Basically, the nice-ness level is the current number of crates you have in the queue, maxed out at 19. That should slow this behavior down to something reasonable. doesn't get code from git repositories. Crates don't have to have any repository at all. cargo publish uploads source code directly from a local directory to


I suggest implementing this at crate owner level. Build crates from different owners, so that each crate owner gets a turn.

e.g. after building a crate (or two) from owner X, move all other crates owned by X to the end of the queue. This will end up doing round-robin over crate owners.


That is unfavorable to owners who publish a large number of fast-building crates.

Why not track the total build time spent by each owner, and have a priority queue sorting owners by total-build-time-spent-so-far? That way, build time is equally distributed over owners, but owners who need less build time than average get visited the most promptly.

I probably would also include some factor of prioritization for popularity, because docs are a service to the users of the crate and not just the authors


The website already has a priority feature and I have seen it used for large sets of related crates. This crate is currently in the queue

That's actually what triggered my idea. I hadn't realized had such capabilities until I saw something was already deprioritized. already has support for deprioritizing large projects that could impact the build queue! The team monitors the queue, and when we notice a big project causing disruptions we set the default priority for those crates to be -1.

The current way projects are added to the "deprioritized list" is manual: the team gets alerts when the build queue gets too long, and if we determine the project is using too many resources we add the relevant rules to the list.

We're using this approach rather than an algorithm to automatically deprioritize to avoid false positives. Up until now the approach worked well, and the problem this time was the team missing the alert. Unless this starts happening consistently I'm not sure this warrants a ton of design work around automation.

We're also planning some infrastructure upgrades to better handle the increase in the number of publishes: right now it's kinda hard to scale due to some architectural decisions, but we have a workable plan to be able to scale in the coming months.


Is it possible to get some real data from (submission time, latency, build duration, etc.)? If so, this could be used as input to simulate how a proposed prioritization algorithm would behave and gather the same stats from such simulations to compare them. I feel like I've seen enough of these kinds of situations where actually running a simulation of the model tells you more than trying to compare abstract prose descriptions quantitatively (because of the second-order effects that tend to happen).


Me and @jyn514 were actually talking about an automatic de-prioritization strategy the other day and opened an issue for it

OK, I looked at the documentation for, and it looks like each crate is built within its own docker image. So this actually gives us a really simple procedure for dealing with the issue that allows us to reweight the relative performance of each build on the fly:

  1. On start up, use docker update -c to set the CPU share to 8. (see this article for an explanation of the CPU share option).
  2. For every 15 minutes of runtime of a given container, decrease the container's CPU share by 1, to a minimum of 1.
  3. There is no step 3.

Basically, this is kind of like using renice, but on the container as a whole, which means that resources are idle only if there's no work to do.

The discussed monopolization isn't per crate (per docker container); that's already capped to 15 minutes (modulo special exemptions). The problem at hand is one publisher publishing multiple crates, the combination of which monopolizes the queue.

I'm aware of the issue, and I'm also aware of how my original understanding was flawed (see the issue on GitHub). I originally thought that there were multiple containers running concurrently, one for each crate being documented, but that isn't the case; there is currently one container running at a time.

That said, if and only if the docs team thinks it's worth the effort, we can combine approaches to solve the various issues:

  1. All crates that are owned by the same entity are built within the same container. Within a given container crates can be built either concurrently or serially, it doesn't really matter which.
  2. Containers are ephemeral; when the last build process completes, the container is disposed of in the same manner that they are disposed of currently (I don't know how that's currently handled, which is why I'm hand waving it away). That said, if a given owner is manually pushing crates (so that there is reasonably large period of time between publications to, and their associated container is still alive, will push the build request to their currently living container and not spin up a new one.
  3. Containers are executed concurrently, and have their relative CPU shares adjusted according the algorithm I gave above.

The trick is that if a given owner publishes multiple crates at once, and is running a container for that owner already, then the build request is passed to the currently running container to process. Since the container isn't terminated until after the last build process completes, it will get progressively less CPU to execute, which will slow a given owner's crates down without affecting other owners. Once the container has executed all build processes, it can be allowed to die naturally. If the owner publishes another crate after their container shutdown, then a new container is spun up, with the default CPU share. That way no one is permanently punished, but if you are taking up resources (either because you have a lot of small crates, or one gargantuan one), you only affect your own crates, rather than everyone else's.

Note that the outlined method won't work as is currently set up. Since it only executes a single container at a time, adjusting the relative CPU share is meaningless. So before this method could be applied, a great deal of work would need to be done to change the entire architecture. And as @pietroalbini has already said in the other thread, this is pretty much a non-issue for the docs team. In short, I'm 99.99% sure that we are now beating a dead horse :wink:

Tangentially, what about letting the option for some of the uploaders to offload the generation of the docs to Github Actions and its own "page"? I don't have precise ideas, but roughly, instead of the docs at…/, we'd have a proxy page that would say:

this crate has1 opted out of rendered docs, by hosting it on their own external site:…/

  • 1 perhaps just temporarily: would nonetheless be allowed to, when idle, perform an actual generation of the docs and replace the redirection with that.

  • with the link warning about an external redirect, and having an option, through cookies, to register that you trust specified hosts so as to skip, on your local machine, that redirect warning and allow it to be performed automatically (potentially with a banner at the top remininding that the site is external?)

  • In other words, I may be too naïve w.r.t. ways this idea could be exploited by evil users (although arbitrary js "injection" in docs is trivial, so I don't think that a warned external-link redirect would be that much worse); but the core idea is that people such as myself, and I imagine, many other rustaceans, wouldn't mind setting up a GH action that would render the docs somewhere, and then tell that rendering their page is not that urgent or important / can be skipped.

Disabling and having it redirect to another place has been brought up before. It's controversial, at best.

My main problem with this is that this will (usually) only document the latest primary branch state, maybe the latest release, and almost certainly nothing "historical".

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.