Since the original thread was about a single user that was responsible to too much queue-delay, I personally still favor a approach that just limits the amount of delay that a single user can create.
E.g. if we say that a user must only be able to add 15 minutes of delay to a queue (on average; and worst-case < 30min). The example in the other thread featured a single user allegedly responsible for multiple hours of increase in queue delay/latency; this would be solved without being overly restrictive, and while effectively taking the current workload of the queue into consideration.
We could proceed as follows (I just came up with the algorithm, feel free to point out flaws):
Once a crate of a (currently untracked) user U starts building, insert a marker M (for user U) after the current end of the queue and start tracking time of builds of crates from U. As long as the time spent on crates of U stays below 15 minutes, proceed as usual as long the marker M is not reached. Once the time spent on crates of U exceeds 15 minutes, after compilation of the crate of U that brought up the tracked time to over 15 minutes finished, all remaining crates of U in the queue that are before M are moved back in the queue to right after M. Once M is reached in the queue, if U had exceeded their 15 minutes, reduce the tracked time for user U by 15 minutes (so it's reduced from a number between 15 minutes and 30 minutes to a number between 0 and 15 minutes) and insert a new marker M' (for user U) at the end of the queue at that time; proceed the same way, with marker M' in place of M, etc. When a marker M for user U is reached without the user having used up more than 15 minutes, their tracked time is set back to zero (It was a number between 0 and 15 minutes before); if they still have crates anywhere on the queue create a new marker M' now and continue tracking the user. If there are no more crates of U in the queue, just discard the marker and stop tracking the user.
As mentioned above, we don't need to be overly restrictive, also I don't think we need to anticipate people "gaming" the system too much. Especially creation of multiple user accounts seems unlikely as long as the mechanism is not overly annoying. (IMO ever leaving the queue idle would immediately qualify as being overly annoying.) This is mostly against unintentionally overloading the queue, and in particular against unintentional creation of large delays in the docs.rs queue.
I'd find anything unacceptable that would involve that a publisher of a single (fast-compiling) crate (or a small number of very fast-compiling crates) having have to wait significantly longer than for all the crates currently on the queue to finish before their crate(s) is/are build. Using popularity instead of FIFO has huge starvation potential AFAICT.
And IMO a publisher of multiple crates should IMO at most wait at long as if they were to publish each crate individually, always waiting for one crate to make it through the entire queue to be built before publishing the next crate, etc. (This latter condition would be violated by the proposal at the top of this thread as well.)
One thing we need to plan out for in any scheduling algorithm is to avoid starving a crate. Every crate needs to eventually build. Popularity might be fed into some other algorithm (you could use popularity to decide how many build-minutes to give someone in the above leaky bucket algo), but on its own, it won't guarantee this.
I don't really want to use popularity anyway. It creates feedback loops: Crates with good documentation become more popular, and popular crates get better documentation? Please no!
How is that? I thought the OP would fulfill this condition, so either I'm failing to do the math, or failing to explain myself properly.
That's a very good point: to handle that, instead of popularity I think it would be better to use popularity divided by estimated build time (although this estimation is itself somewhat expensive - maybe count the number of tokens in the source code using the RA tokenizer?).
The theory behind this would result in spending docs.rs build time in the way that would benefit the most people.
As soon as we're talking about malicious actors, all rules regarding acceptable behavior go out the window, and if you're willing to sock puppet, then you're (IMHO) a malicious actor. I don't think that we're at the point of thinking about real threat models yet, we're just trying to make it so that if someone makes a mistake they don't bring everything down with them. Once we get to the point where someone is deliberately trying to break docs.rs, we'll have to figure out other mechanisms to force better behavior.
Sure. I'm saying that reporting such behavior to GitHub is likely to get their abuse team on the case and shut down the backing GitHub accounts too (that is, crates.io is not necessarily on its own when dealing with account abuse).
I shared my thoughts in the other thread, but to summarize we have a way to manually deprioritize huge projects: this time we missed the alerts, but in the past the approach has been successful. Unless this starts happening consistently I'm not convinced we need to spend a lot of time on designing rate limiting.
The docs.rs team is also working on removing the blockers that prevent us from scaling the build system to handle the increased load. We don't expect this to happen instantly due to some architectural limitations, but we have a clear path on how to solve them!
Yup, and as long as they have GitHub accounts, that'll be a good way to handle it. But as @kornelpointed out in the other topic, you don't need a GitHub account to publish to crates.io (and thence to docs.rs). So relying on GitHub's policies won't solve it.
So, you need a GitHub account to authenticate with crates.io, but you don't need to actually publish the source code on GitHub (or any git repository). Still, the crates.io team has the tools needed to handle such cases.
All crates.io accounts go back to GitHub right now. Sure, I can give an API token of my account to someone without their own, but then that's me putting my account on the line at that point. And the linked post mentions that the code doesn't need to live on GitHub, not that one doesn't need a GitHub account.
Thank you to both of you for explaining this to me, I clearly misunderstood what @kornel was saying. In that case, @mathstuf is right, and working with the GitHub abuse team would work.
OK, so if I understand everything that's been said thus far correctly, a) this hasn't been a really big issue in the past, b) it's not expected to be a major issue in the future (this was a one-off issue), and c) the docs team already has a way of dealing with the issue that works well enough for the moment, and this is just one that got away from everyone. Is this all correct?
If so, it sounds like reworking the entire docs.rs pipeline would involve a lot of work for very little gain. Maybe the outcome of this Pre-RFC chat should be an RFC that outlines what possible courses of action could be taken in the future, if the need arises. That way we'll have a battle plan that can be dusted off, freshened up, and put in place with less effort than trying to invent something on the spot if/when abuse becomes a serious issue. Would that work for everyone?
This topic seems related to the Adios Pagers article in TWiR. I was personally surprised to hear crates.io team members were rotating pager duty, and it sounds like this is a rare care where the larger user community was affected, but I get the impression the crates.io team is already working hard on it.
Speaking with my docs.rs hat, and having dealt with abuse situations like this in the past, I'm wary having an RFC specifying what the docs.rs team has to do in case of abuse. We can't foresee any possible abuse that could happen and what the best way to address it is, and tying the docs.rs team's hands in those cases would likely worsen the situation, not improve it.
By the way, the crates.io team is completely separate from the docs.rs team (the only membership overlap is me, as I'm in both teams), both in a decision-making and in an operational point of view. There isn't anyone on pagerduty for docs.rs at the moment.
NONONONONONO!!! I'm sorry if I gave the impression that the RFC would be prescriptive, I meant that the RFC would be a collection of ideas that have already been thought through that the docs teams could then reference if/when abuse picks up, kind of like having a strategy guide or a tactics manual available. The idea is that the RFC would outline possible courses of action that can be taken, their perceived advantages and disadvantages, and maybe when/how to apply them quickly. That way, if abuse suddenly picks up, you don't have to try to invent something while under attack, you can quickly browse a set of ideas, pick things that you think might work, and adapt them to the current situation. If none of the tactics would work, then at least you don't have to spend time trying them when their disadvantages show that they won't work in the given situation.