Thanks!
Any chance you could post a script to scrape the data ourselves, or upload the database somewhere. I’d love to tinker with the data!
Yep, all the code is on the repo, and there’s a bootstrap.sql
which will build a PostgreSQL DB with the GitHub data (it doesn’t include CREATE DATABASE
but if you point it at an empty database it will init a schema and insert all the scraped values). I’m using pg 9.5 on my machine, but I don’t think I’m relying on any behavior not available in 9.3/9.4 (I could be wrong though). The code is still super rough as is the database schema, but it mostly works for now! I also uploaded a file (queries.sql
) that has some test queries for most of these metrics that you can try out and mess with. Issue reports are more than welcome if you find something wrong while poking around.
Apart from point estimates like mean/median. You could try some graphs. In this case, I would plot some sort of histogram where the X axis are categories like: “closed within one hour”, “one day”, “one week”, etc and the Y axis are the number of issues/PR in that category.
I agree that graphs are useful! I’ve been thinking that most of the metrics discussed here should be presented in a time series of some sort (in addition to histograms/pie charts/etc. which I agree would be super useful). I think time series are useful for dashboards because they allow one to see incremental creep that might not be noticeable otherwise. Only problem is that to have a good time series, one needs to have a decent metric for each point (thus my suggestion of a scalar metric for open time).
Yeah, you could get a percentage out of this. I think “over the past 365 days we’ve uploaded 360 nightly releases” sounds nicer though.
In terms of visualizations, I was picturing something like a 15x30 (or whatever) grid with red and green squares to give a quick at a glance representation (eyes are good at telling proportions, IME, and it’d be easy to see “streaks” that way too).
AFAIK, all the builds (each and every platform) have to succeed to get a nightly release. If you go ahead and parse the manifests, you’ll find out when the first binary release for each platform was produced though.
I’m just finishing up right now a scraper that says “if we hit the manifest URL and get a 404, it failed, if we get a 200 the nightly build released.” If it’d be useful to have a per-platform/per-binary breakdown (maybe for tier-2/3 builds?), I can add that in, but I imagine there are more important data points hiding in TravisCI and the buildbots.