"No telemetry in the Rust compiler: metrics without betraying user privacy"
I have some thoughts about something I believe we need: local-only stable-compiler metrics. For a while I've felt that we don't really have enough visibility into the way
rustc really works on users' machines. Long gone are the times when most of the community relied on nightly, causing some features to have a trial by fire where their only real usage happens after their stabilization. There are issues that only evidence themselves in a transient manner, particularly with malformed code. People try to file tickets that can no longer be reproduced.
Anything even resembling telemetry is always a contentious topic, so I want to clarify that I am not proposing any kind of telemetry (for end users, I would expect us to have a telemetry service for crater and perf using these metrics).
I've written the above post both as a way to start a conversation on the matter, and as a signal both to the project and the community at large on where I believe we should stand, what are the clear lines that we should not cross.
I'm personally very pleased to see your proposed design of separating out the recording of metrics locally with submission.
One of the (many) advantages of this scheme is that, if it's successful, it may eventually generalize: other programs could also record local metrics and there could be a single suite of tools for manipulating and submitting them (complete with support for privacy preserving schemes like RAPPOR).
I don't mean to suggest that all this work needs to be done before Rust can adopt this scheme, but rather that the many upsides validate that this is a good design.
I agree. I would love to do something like this with ripgrep, and following whatever is developed here seems like a great start.
I love the "rustc should never have network access" idea.
I definitely appreciate the anticipation of distaste toward metrics in the open-source community. One thing I'll add, however, is
rustc seems to be in a unique position as an open-source developer tool with a strong core community, because:
- The users of the compiler are developers themselves, many of whom can verify that the metrics don't violate privacy by grepping and skimming the rustc source. Even more others are probably willing to just trust those in the community who have verified it themselves, or even just trust the compiler devs outright (after all, we trust you with our computers already, right?). This is usually called "trust, but verify".
- I don't see many ways in which privacy is a meaningful concept at least wrt metrics in rustc (i.e. who cares about exposing how long rustc took to do X?). Core dumps or logs might leak the source code it was compiling, however.
- The metrics client and server would be open-source and thus verifiable, and the metrics data could even be made public.
Also, this may be off-topic, but a small warning regarding putting hopes into RAPPOR et. al: I did some research about using differential privacy techniques for Signal Messenger about 5 years ago, and determined that 1) the metrics would become too lossy with a small user base (of millions of users, vs. billions for the authors of RAPPOR); 2) the privacy guarantee relies on persistent state, which is probably problematic for rustc given devs are likely to use multiple machines / clear local state on their machines occasionally, and 3) there wasn't many useful statistics you could actually derive from the data besides boolean "yes/no" questions. Some of this has probably improved since I researched it, or I may also just have been wrong, so this is just a heads up for anyone going down that route.
It's definitively possible that, with some statistics, this data may be enough infer which open source crates you were likely compiling; and by analyzing how they fit together, to partially reconstruct your
Cargo.lock to a certain degree of accuracy. This is specially true if your sequence of uploaded metrics are correlated in some way (sent by the same IP, sent using an unique identifier, etc)
Measurement noise (like whether you were running other programs on each compilation) may or may not make this impractical, however.
I just wanna say that, in regards to this:
user information should never leave their machine in an automated manner
If there was a way to opt-in to telemetry I'd happily turn it on. I always use nightly for my personal projects, and I trust y'all more than I trust all the organizations that are currently spying on me every day.
If people are scared by the mere existence of telemetry, eg. that they'll somehow accidentally switch it on without realizing it, then maybe there could be a notification every time rustc phones home. That notification could tell the user what configuration option is causing telemetry to be sent and how to put it back to its default disabled state.