Ideas to enhance rustdoc search speed

est31 · April 10, 2017, 5:47am

The search field in rustdoc is quite slow, and grows slower the larger the project is. It would be really cool if the speed could be improved.

E.g. take some larger projects like rustc, or servo to see the very slow speed when searching something:

ATM it seems to iterate over all words in the search index and check whether the searched word matches based on substring and levenshtein criteria for every single word. This might be okay for small projects, but not for huge projects like above.

Maybe we could implement fuzzy search via tries or similar means? Or maybe even switch to a fully fledged search engine that we compile to wasm, like tantivy?

est31 · April 10, 2017, 6:38am

A link to the code in question, and one to the core algorithm.

Another link to an awesome crate by @burntsushi, maybe it is useful: https://github.com/BurntSushi/fst

dzamlo · April 11, 2017, 11:06am

Another approach that may be worth considering is using multiple web workers to parallelize the search.

burntsushi · April 11, 2017, 11:37am

I imagine there’s room to come up with a very simple index here that doesn’t use a full blown search engine. The core ideas in tantivy (and even Lucene) are deliciously simple. Here’s a brief sketch. Please ask questions if anything is unclear!

rustdoc can build an n-gram index (probably with n={2,3,4}). This index is a simple map from every n-gram to a list corresponding to the words/functions/types that contain that n-gram.
A search is broken into three phases. The first phase extracts all of the n-grams from the query. The second phase uses the aforementioned index to find all words/functions/types corresponding to each n-gram in the query. The third phase executes the same search that is done today on the results from the second phase.

If you stopped right there, I bet you’d wind up with a nice win over the current state. Further tweaking may be desirable:

If there is an n-gram that occurs in the vast majority of all possible search results, then a query that contains that n-gram will cause this search technique to degrade to the current implementation of search. (This is typically where a search engine will use something called stop words. The cost is a drop in recall, but it’s usually worth it.)
If you wanted to get really fancy, you could add the documentation text itself to the n-gram index and then apply tf-idf to rank the search results.
Experiment with tokenizing schemes other than n-grams, particularly if you’re indexing natural language rather than just programming language identifiers. (A search engine will use stemming or lemmatization, depending on your sophistication level.)

I think this could be a fun project for anyone who wanted to take it on. I’d be happy to help mentor it.

Of course, parallelizing the search might go a long way to solving the most pressing problem. And also, FWIW, the searches in the servo docs aren’t that slow for me. There’s clearly a little bit of a delay, but it’s not awful.

ehiggs · April 11, 2017, 1:53pm

In Javascript? Or would we accept reading from an index on disk? If so, sqlite has a full text search system so people can search from their local installs. Or move to use a documentation server which can use rg as a library.

burntsushi · April 11, 2017, 2:25pm

Yes. The point of my post is that there is a simple indexing scheme that someone can implement. That does not rule out use of heavier tools, but it's not at all obvious that they are necessary. Of course, I always encourage anyone to do their own experiments and form their own conclusions.

konstin · April 11, 2017, 11:41pm

Shot in the dark: Wouldn’t this be the perfect for creating a rust to wasm showcase? Using all the existing high performance crates implementing an optimized search would be very easy, and wasm will give an additional performance boost for free. The current search could stay as a fallback for older browsers.

system · March 25, 2019, 8:13am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Full text search for rustdoc and doc.rs	15	946	June 8, 2024
Documentation for rustdoc search? tools and infrastructure	5	532	July 24, 2023
Documentation search engine (Chrome quick search) documentation	9	3721	March 25, 2019
Rustdoc - use Stork search index?	25	2114	July 21, 2020
Search Rust Book documentation	2	1622	March 25, 2019

Ideas to enhance rustdoc search speed

Related topics