Appreciable compilation slowdown with Sync + Send markers

I have a project developed with Rust, which contains code like:

use rustc_serialize::json::Object;

pub type SessionMap = HashMap<String, String>;
pub type WorkerResult = Result<Option<Box<Iterator<Item=Object>>>, String>;

pub trait Worker {
    fn next(&mut self, context: &mut SessionMap) -> WorkerResult;
}

impl<F> Worker for F where F: FnMut(&mut SessionMap) -> WorkerResult {
    fn next(&mut self, context: &mut SessionMap) -> WorkerResult {
        self(context)
    }
}

Project compiles about 25 seconds, but if I change generic Worker trait implementation for closures with (add Sync + Send for F):

impl<F> Worker for F where F: FnMut(&mut ContextMap) -> WorkerResult, F: Sync + Send {
    fn next(&mut self, context: &mut SessionMap) -> WorkerResult {
        self(context)
    }
}

than project compiles more than 4.5 minutes (10x times slower at least).

I have some questions because of it:

  1. What takes the long compilation time in the second case?
  2. Is it possible to increase speed of compilation in that case?
  3. How to benchmark compiler phases?

rustc 1.5.0-nightly (81b3b27cf 2015-10-11), x86_64-pc-windows-gnu, win10+msys2

Nice find! Sounds like a great opportunity to improve compiler perf :smile:

The -Z time-passes and -Z time-llvm-passes flags can help debug.

May be related to issue #26325.

The difference was more than I thought (600x times slower with time passes).

The major perfomance impact with item-bodies checking:

time: 0.003; rss: 73MB    item-types checking
time: 0.317; rss: 106MB    item-bodies checking
time: 0.000; rss: 106MB    drop-impl checking

Case with Sync + Send:

time: 0.004; rss: 73MB    item-types checking
time: 196.509; rss: 106MB    item-bodies checking
time: 0.000; rss: 106MB    drop-impl checking

There is no difference where I place the specification (in where clause or in the impl type parameters). Seemingly it depends on trait implementors. When I use something like:

pub trait DivertTrait: Sync + Send {}

impl<F> Worker for F where F: FnMut(&mut ContextMap) -> WorkerResult, F: DivertTrait {
    fn next(&mut self, context: &mut ContextMap) -> WorkerResult {
        self(context)
    }
}

that code compiles during the normal time. But the following code also compiles slowly:

trait NeverImplementedTrait {}

impl<T: NeverImplementedTrait + Sync + Send> Worker for T {
    fn next(&mut self, _context: &mut ContextMap) -> WorkerResult {
        Ok(None)
    }
}

It looks that compiler tries to find every implementor for every item of said triple, but uses agressive algorithm to detect them (why it tries to find everything if I have NeverImplementedTrait?).

Is this behaviour by design or possible to be fixed?

P.S. I’ve warm up GDB already :grinning:

It would be nice if you could create a short example demonstrating the problem so we can profile it.

It was not simple to reproduce, but example is ready :tada:: GitHub - therustmonk/syncsendslow

Repo contains branches with fixes:

  • impl-fix - where I removed Sync+Send pair
  • rename-fix - where the method of trait was renamed
  • master branch contains buggy example

But I am concerned about the new discovery: slowdown exists when name of trait's method is next! Now it looks like a compiler's bug :hushed:

2 Likes

Well, of course the compiler has to do a lot more work when so many types implement the method.

What do you mean? next is a method of my trait, in the earlier example I bound generic with a trait without any implementation. It also has a different declaration in comparsion with other next methods of std traits.

Why I can't use next name for method of own trait?

Not to defend the compiler’s performance in this case, but in general when we see a call like foo.next(..) we do not know what trait the method next belongs to, and we have to figure it out by examining what traits foo implements. Still, I suspect we’ll be able to improve things here quite a bit.

For which reason this approach is used? Why trait's names don't take into consideration? I don't see duck typing feature. Traits ignoring seems to me strange.

Is there a list of never to do things for Rust like Python has: "no cpu-bound tasks because of GIL"?

@DenisKolodin

If you could give a code example demonstrating the slowdown, I will try to track it down.

@arielb1 Here it is:

extern crate mongodb;
use mongodb::coll::Collection;

pub trait Worker {
    fn next(&mut self);
}

impl<F> Worker for F where F: FnMut(), F: Sync + Send {
    fn next(&mut self) { self() }
}

pub struct GetRecord {
	collection: Collection,
}

impl Worker for GetRecord {
	fn next(&mut self) {
		let mut cursor = self.collection.find(None, None).unwrap();
		let _ = cursor.next();
	}
}

fn main() {
}

An example without external deps would have been nicer.

I regret that I can’t reproduce slowdown without the dependecy. I think because it also has the next method.

May be useful: I use cargo in verbose mode to get rustc command with arguments to call it under debugger later with dependencies (especially when there are a lot of them).

It looks like winnowing could use some memoization. There is a different O(2^n) issue that involves projections, but it doesn’t look like the problem here.

struct S0<T>(T,T);
struct S1<T>(Box<S0<T>>,Box<S0<T>>);
struct S2<T>(Box<S1<T>>,Box<S1<T>>);
struct S3<T>(Box<S2<T>>,Box<S2<T>>);
struct S4<T>(Box<S3<T>>,Box<S3<T>>);
struct S5<T>(Box<S4<T>>,Box<S4<T>>);
struct S6<T>(Box<S5<T>>,Box<S5<T>>);
struct S7<T>(Box<S6<T>>,Box<S6<T>>);
struct S8<T>(Box<S7<T>>,Box<S7<T>>);
struct S9<T>(Box<S8<T>>,Box<S8<T>>);
struct Sa<T>(Box<S9<T>>,Box<S9<T>>);
struct Sb<T>(Box<Sa<T>>,Box<Sa<T>>);
struct Sc<T>(Box<Sb<T>>,Box<Sb<T>>);
struct Sd<T>(Box<Sc<T>>,Box<Sc<T>>);
struct Se<T>(Box<Sd<T>>,Box<Sd<T>>);
struct Sf<T>(Box<Se<T>>,Box<Se<T>>);

trait Foo { fn xxx(&self); }
trait Bar {} // anything local or #[fundamental]

impl<T> Foo for T where T: Bar, T: Sync {
    fn xxx(&self) {}
}

impl Foo for Sf<u32> { fn xxx(&self) {} }

fn main() {
    let _ = <Sf<u32>>::xxx;
}
struct S0<T>(T,T);
struct S1<T>(Box<S0<T>>,Box<S0<T>>);
struct S2<T>(Box<S1<T>>,Box<S1<T>>);
struct S3<T>(Box<S2<T>>,Box<S2<T>>);
struct S4<T>(Box<S3<T>>,Box<S3<T>>);
struct S5<T>(Box<S4<T>>,Box<S4<T>>);
struct S6<T>(Box<S5<T>>,Box<S5<T>>);
struct S7<T>(Box<S6<T>>,Box<S6<T>>);
struct S8<T>(Box<S7<T>>,Box<S7<T>>);
struct S9<T>(Box<S8<T>>,Box<S8<T>>);
struct Sa<T>(Box<S9<T>>,Box<S9<T>>);
struct Sb<T>(Box<Sa<T>>,Box<Sa<T>>);
struct Sc<T>(Box<Sb<T>>,Box<Sb<T>>);
struct Sd<T>(Box<Sc<T>>,Box<Sc<T>>);
struct Se<T>(Box<Sd<T>>,Box<Sd<T>>);
struct Sf<T>(Box<Se<T>>,Box<Se<T>>);

trait Foo { fn xxx(&self); }
trait Bar {} // anything local or #[fundamental]

impl<T> Foo for T where T: Bar, T: Sync {
    fn xxx(&self) {}
}

impl Foo for Se<u32> { fn xxx(&self) {} }

fn main() {
    let _ = <Se<u32>>::xxx;
}

Cool example! :+1:

playpen: timeout triggered!

Could you help me to find it in the rustc sources? Which functions I have to track?

I'm interesting to see how it works internally.

It is somewhat subtle - but the sources are at src/librustc/middle/traits. I would prefer @nikomatsakis to help me with that.

I ran @arielb1’s example with perf record --call-graph dwarf rustc ..., and then perf report indicates that 60+% of the compilation time is spent in (demangled) middle::traits::select::SelectionContext<...>::confirm_candidate and its descendants. (Any C/C++ profiler should work for this task, e.g. perf on Linux or Instruments.app on OSX)

(The crate name isn’t included in the symbols but the rest of the path is usually more than enough, e.g. the middle module is in the rustc crate, and even without that info, using grep and/or DXR’s search for the method name works well.)

3 Likes