I played around with it a little bit. I created a table of rust files by:
- Filtering through repos for which GitHub claims has Rust files
- Then filtering files that end with ‘.rs’
Maybe I can make this available to the public? I don’t know, yet. The content of all the rust files is about 1.4 GB, so storage costs for maintaining that table is practically free. However, I don’t think you can get around the fact that you need to use 1.5 TB to collect the rust files, so updating the database will be about $7 (about $2.50 if you only do it once a month).
Anyway, I don’t know much about SQL except that you need to SELECT stuff FROM something, but these results may be interesting nonetheless. If you have a specific query in mind, let me know, or perhaps I’ll just figure out how to make the table public.
Fun note: This table sources data from 11,947 repos, which contain 840,778 160,447 rust files, consisting of 1.40GB.
#[unstable(…)]
I selected everything that had (optional !)[unstable(...)]
. I guess I forgot to add a #
(shrugs). Inside the ...
I selected all the features, stuff inside features = "..."
. I turns out that everybody except for one person actually used a space between features and =, so that was nice. Here is the query:
SELECT line, count(*) as n
FROM (
SELECT
REGEXP_EXTRACT(
REGEXP_EXTRACT(
content,
r'(?s)!?\[unstable\((.*?)\)\]'
),
r'feature = \"(.*?)\"'
) as line
FROM rust_lang.contents
)
GROUP BY line
ORDER BY n DESC
Here are the results: https://pastebin.mozilla.org/8884291. So the top 10 are:
- core (1429)
- rustc_private (542)
- std_misc (313)
- collections (220)
- alloc (88)
- rand (84)
- hash (73)
- unicode (71)
- io (55)
- as_slice (52)
#[features(…)]
Figure knowing which features are being used could potentially be useful. I first selected things that looked like [feature(...)]
, and from there I split by commas and stripped whitespace. Here is the query:
SELECT line, count(*) as n
FROM (
SELECT
REGEXP_REPLACE(
SPLIT(
REGEXP_EXTRACT(
content,
r'(?s)!?\[feature\((.*?)\)\]'
),
','
),
r'[\r\n\s]+',
''
) as line
FROM rust_lang.contents
)
GROUP BY line
ORDER BY n DESC
Here are the results: https://pastebin.mozilla.org/8884293.
Please feel free to let me know if I screwed up any of the queries
cc @brson