Idea: Make `std::mem::needs_drop` smarter

Currently it is possible and useful to use ManuallyDrop, MaybeUninit or std::ptr::drop_in_place for writing generic datatypes Foo<T> that implement Drop in a way that actually does nothing if std::mem::needs_drop<T>() == false. I can imagine things like owned pointers to the stack or an arena, or using ManuallyDrop just to specify drop order. And there’s probably lots of scenarios I’m not thinking of, basically everything where the destructor must be written manually but does not handle any synchronization logic or deallocation. I guess, this can also include Box<T> for zero-sized drop-glue-free T.

Right now, for such a type Foo<T> we have std::mem::needs_drop::<Foo<T>> evaluating to true. This in turn means that something like Vec<Foo<T>> will loop through the whole vector with a no-op destructor (unless the optimizer catches this, which seems likely; but there ought to be some better examples, otherwise needs_drop() wouldn’t exist, right?)

As far as I’m aware, needs_drop is allowed to change its implementation without this being considered a breaking change, which means that the current behavior can be changed, and also my proposed new behavior below is not a semver hazard (as it will introspect the actual Drop implementation source code which breaks encapsulation a bit).


So my idea is that we could have the Rust compiler realize certain kinds of no-op Drop implementations are doing nothing and use this to improve the information provided by std::mem::needs_drop.

Possible cases for improvement:

  • Definitely possible and quite minimal approach: If a type where none of the fields have drop glue implements Drop in a way that the whole body of fn drop is wrapped in a if COND { ... } where COND is an expression that can be const-evaluated (or how do you call that? I mean something that could be assigned to consts or statics), then std::mem::needs_drop() should return false whenever this COND evaluates to false; and in this case the type itself would also be considered not to have drop glue. This would give people that want their types to not have drop glue a quite powerful way to archive it.
  • Maximal approach: Whenever a destructor (including its drop glue) is compiled to a no-op after optimization, then the corresponding needs_drop() check returns false for that type. Possibly difficult / impossible since this would make a const fn depend on the result of the optimizer, which probably includes LLVM (however I have no idea how the rust compiler actually works, so who knows). This approach would however give the greatest performance benefit to unmaintained or non-handoptimized code.
  • Middle ground: Allow a few more things next to the if COND guarded stuff, like MaybeUninit::assume_init or std::ptr::drop_in_place or ManuallyDrop::drop, etc, to make some very simple Drop implementations work without need for change. It is however highly nontrivial to decide what to include in such an analysis.

I’m interested in whether you think this is reasonable idea and a valuable improvement, and if yes, which version/kind of analysis you think is possible and useful. Also links to previous discussion about anything related would be appreciated. With positive feedback, we could try to make an RFC out of this.

As a more heavy-handed alternative, offer an default-provided const Drop::NEEDS_DROP: bool to provide an explicit hint for types that are trivially droppable in some configurations.

(Reminder: it's never unsafe to fail to drop something (that isn't pinned).)

8 Likes

Presumably setting such a constant to false should only make needs_drop return false when none of the fields need to be destroyed, either...? Feels like it would make it somewhat safer to use.

Also, your suggestion is a technically breaking change, because it makes Drop no longer object-safe. (Because dyn Drop is a type that we all desperately need...?)

I would maybe accept a middle ground where we stick an attribute on the Drop implementation along the lines of #[maybe_nop], and, if and only if that is applied, then we do the if CONST optimization.

Alternatively you could introduce a new trait (and even make it derive-able based on the type's fields) that has a const NEEDS_DROP: bool. That would avoid the breaking change.

Another alternative would be const eval in where clauses. Something like:

impl<T> Drop for MyType<T> where core::mem::needs_drop::<T>() {
  fn drop(&mut self) {
    // ...
  }
}

People have brought up const eval in where clauses before, and there are a number of proposed syntaxes for it. I'm not interested in bikeshedding the syntax (that should be in its own thread). But you don't really even need full const eval. Just const generics and specialization is enough, because you can use something like:

trait True {}
struct If<const B: bool>;
impl True for If<true> {
}

impl<T> Drop for MyType<T> where If<core::mem::needs_drop::<T>()>: True {
  fn drop(&mut self) {
    // ...
  }
}

But this currently generates an error because "Drop impl requires If<...>: True but the struct it is implemented for does not". I'm not sure why Rust has that restriction or if it can/should be relaxed.

1 Like

I think this works quite well:

pub trait Drop {
    const NEEDS_DROP: bool = true;
    fn drop(&mut self);
}

struct S1;

// still works as before
impl Drop for S1 {
    fn drop(&mut self) {}
}

// type with manual drop order
use std::mem::ManuallyDrop;
struct S2<T1, T2> {
    x: ManuallyDrop<T1>,
    y: ManuallyDrop<T2>,
}
impl<T1, T2> Drop for S2<T1, T2> {
    const NEEDS_DROP: bool = needs_drop::<T1>() || needs_drop::<T2>();
    fn drop(&mut self) {
        unsafe {
            ManuallyDrop::drop(&mut self.y);
            ManuallyDrop::drop(&mut self.x);
        }
    }
}

// Box, but without support for unsized T
use std::marker::PhantomData;
use std::ptr::NonNull;
struct Box<T>(NonNull<T>, PhantomData<T>);

use std::mem::size_of;
impl<T> Drop for Box<T> {
    const NEEDS_DROP: bool = needs_drop::<T>() || size_of::<T>() > 0;
    fn drop(&mut self) {
        if Self::NEEDS_DROP {
            // do deallocation and drop_in_place
        }
    }
}

(extended example in playground)

1 Like

I think this is good idea. It would mean that defining NEEDS_DROP and then guarding the drop implementation with if Self::NEEDS_DROP is a straightforward way to keep the needs_drop information consistent with the actual drop implementation plus its drop glue. (Since an incorrect NEEDS_DROP value is otherwise a pretty hard to catch bug.)

Your Box, it's incorrect to have it ever not need a drop, as it should always deallocate.

It's already maximally correct for a Box to drop_in_place and deallocate (as drop_in_place has an internal needs_drop check).

I thought, for a zero-sized type inside of the Box, one would not need to allocate anything, and hence also not deallocate on drop either. Any correctly aligned non-zero pointer should be sufficient, no need to talk to an allocator.

needs_drop returns false for any type without drop glue, not just zero sized types.

Oh wait your box also is requiring size_of, my bad. (I'm going to blame horizontal scroll on mobile because I'm in the middle of a move)

The suggestions that people came up with in this thread are so heavy-handed that I have to point out that all of the performance gain we are talking about is pure speculation until actually and comprehensively measured.

Just how big of a problem is missed drop-optimization today? Does it really warrant adding new lang-item traits, or breaking one of the most fundamental traits, or heavily impeding decoupling in the architecture of the compiler by requiring feedback from the optimizer at semantic analysis time?

1 Like

Indeed, having at least one use case with performance benefits would be great. That’s why I remarked

I’m still waiting for anyone to mention a type in the standard library or a popular crate that uses and benefits from it’s use of needs_drop. Also I’m not yet sure which existing types from the standard library or common crates apart from Box (and similarly Vec) in the uncommon case of storing zero-sized drop-glue-free contents, could actually change their NEEDS_DROP value.

However, regarding

I don’t see (yet?) how adding an associated const with a default value breaks anything. The only thing that would change for the user is the documentation of the Drop trait.

I’m agreeing that

is not a reasonable approach for this.

The benefit is second-order, like in hashbrown, where it can avoid iterating the map to drop all the members if they're all !needs_drop (or if the map is empty, which is what the linked PR is adjusting).

So for a significant advantage of a more accurate needs_drop, you'd need a collection of whatever trivial case of an otherwise interesting type to drop... which is probably not all that common.

Something like HashMap<u32, ArrayVec<[u8; N]>> would be an example. (ArrayVec has a no-op destructor when its element type has no destructors.) It would be interesting to see if this impacts benchmarks that use maps/sets whose elements contain ArrayVecs.

4 Likes

In my experiments with this, it appears that the iteration in the HashMap destructor gets optimized away completely, despite the "false positive" from needs_drop. For example, both of the following benchmarks return 0 ns/iter, no matter how many elements or sets they create:

#![feature(test)]
extern crate test;
use arrayvec::ArrayVec;
use std::{collections::HashSet, hash::Hash, iter::repeat};
use test::{Bencher, black_box};

fn bench_drop_sets<T>(b: &mut Bencher, n: usize, len: usize)
where T: Default + Clone + Hash + Eq
{
    let set: HashSet<T> = repeat(T::default()).take(len).collect();
    let sets: Vec<HashSet<T>> = repeat(set).take(n).collect();
    let mut sets = black_box(sets);
    b.iter(|| sets.clear());
}

#[bench]
fn arrayvec_drop(b: &mut Bencher) {
    bench_drop_sets::<ArrayVec<[usize; 31]>>(b, 100, 1000);
}
#[bench]
fn array_drop(b: &mut Bencher) {
    bench_drop_sets::<[usize; 32]>(b, 100, 1000);
}
1 Like

I'm not sure that proves anything -- Vec::clear has nothing to drop after the first time.

1 Like

I changed that to use hashbrown directly, and clearing the sets individually:

    b.iter(|| sets.iter_mut().for_each(|set| set.clear()));

With hashbrown 0.8.1 that only checked needs_drop::<T>():

test array_drop    ... bench:       2,235 ns/iter (+/- 11)
test arrayvec_drop ... bench:       6,937 ns/iter (+/- 120)

With hashbrown 0.8.2 that checks needs_drop::<T>() && len() != 0:

test array_drop    ... bench:       2,175 ns/iter (+/- 35)
test arrayvec_drop ... bench:       2,125 ns/iter (+/- 40)

So before, the bogus needs_drop did indeed cause extra work.

3 Likes

Ah, thanks for fixing my broken benchmark.