Implicit drop considered harmful

Hello Rust Community,

First of all: Thanks for your effort with this incredible language. I've been using Rust for quite some time now and am still amazed by its beauty.

While developing a new programming language, I encountered a problem that surprised me quite a bit, and I think it might be relevant for other projects as well.

Initially, I was very happy with the performance of my interpreter:

for i = 0 to 10_000_000 {
   x = 1
}
Elapsed: 66.3538ms

Then I worked on other parts of the project. When I later ran the same test again, the runtime was suddenly drastically slower:

for i = 0 to 10_000_000 {
   x = 1
}
Elapsed: 144.8416ms

This confused me because I hadn't changed anything in the interpreter itself. After extensive debugging, I finally found the reason: a change to the enum that represents variables in the interpreter.

Before:

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
enum Var {
   Void,
   Int(i32),
   Float(f32),
   ...
}

Then I added a new variant:

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
enum Var {
   Void,
   Int(i32),
   Float(f32),
   Array(Vec<Var>),  // New variant
   ...
}

Although I hadn't even used this new variant in the interpreter logic, its mere existence caused the performance to drop significantly. At first, I thought cloning the vector might be the problem, but it turned out that it had nothing to do with cloning.

So what was the problem?

When I store a variable, it looks like this:

stack[var_idx] = y.clone();

I'm using clone() here to make copies explicit, as Var is quite large, making copying expensive.

If a variant of Var implicitly implements Drop, Rust must check on every assignment whether the old value implements Drop to release memory if necessary. That is, after introduction of the array variant the code above effectively compiles to something like:

match stack[var_idx] {
   Var::Array(v) => v.drop(),  // Drop is called for the `Array` variant
   _ => {},
}
stack[var_idx] = y.clone();

Even though in my case only the Var::Int variant was used, the new Var::Array variant caused Rust to always perform the drop check. This is correct behavior, but it wasn't immediately obvious to me.

This problem also occurs when an enum contains a variant that contains a struct, which in turn contains an enum that has a variant with Drop, making it even harder to track down. A practical example:

enum Var {
    ...
    Function(Function),
}

struct Function {
    result_type: Type,
    ...
}

enum Type {
    ...
    ComplexType(Box<ComplexType>) // We use Box to save memory
}

Due to the Box in Type, all assignments of Var and all situations where a Var becomes invalid suddenly become very slow. Instead of the expected acceleration due to the more compact type, the program slows down because of the invisible drop checks.

In my opinion, this behavior somewhat collides with the "principle of least surprise" and the zero-overhead principle ("You don't pay for what you don't use") and can lead to unintended performance degradation, as in my case.

What do you think: Would it make sense to make this behavior more explicit? One idea might be to introduce a marker trait that explicitly signals that a type contains drop logic. This would allow developers to ensure that unwanted drop checks are avoided.

I look forward to hearing your thoughts!

1 Like

Feels semver-check adjacent to me, like changing variance or auto-traits. (Going from having a trivial destructor to a non-trivial destructor is a breaking change, so far anyway.)

You can write an assertion using needs_drop.

5 Likes

In your language, variables are presumably dynamically typed and may contain arrays which do need to be deallocated. Therefore the type check is not something you don't use -- you need the dynamic type check to see if it's not an array. So you can't expect zero overhead for this.

1 Like

One approach could be a SOA (Struct of array) style representation, like Zig uses:

  • Have one array of u8 tags
  • Have another array of 32 bit "things" that are interpreted as immediate values or indexes into a third extra array (depending on the value in the corresponding tag index).
  • Have a third array with the large and expensive data. One way is to make this be a bump allocator essentially (so you can have mixed types). Or you could have different arrays for each of your large types.

The first two arrays are always the same length, but offer good cache locality thanks to the SOA approach. And the third (or more) array(s) are used for the uncommon case.

Now you no longer pay for what you don't use. Plus you likely get better cache locality (especially if you are using this to represent AST where locality has a useful meaning).

You also get rid of unneeded padding bytes this way, since every array is compact and has no per-entry padding.

Of course there is no free lunch, and the cost here is that it is much more awkward to use and you loose type safety (you can add that back with an abstraction on top, but that is still some extra work).

Edit: There is a crate for SOA-via-proc-macros: Soars — Rust library // Lib.rs (but it doesn't work for enums, just plain structs).

Edit 2: There are actually a few different crates for this when I search for SOA on lib.rs. YMMV, haven't tried any of them.

3 Likes

Are you sure it's drop checking affecting perf, not type size change? That feels like a much bigger change to me.

5 Likes

Thanks for your response, Vorpal. Your approach is really interesting, and I can see how it could improve cache locality and help with the performance issues.

After figuring out what was causing my problem, I came up with a somewhat similar solution. I now use two enums: one for simple types and one for complex types. The complex types are stored in a slab, and the simple types hold the slab index (just a plain usize ). This way, I was able to get back to the original performance.

But what still bothers me is how drops are kind of "invisible" to the programmer.

In a language with manual memory management like C, the programmer obviously sees every free call. Since free is quite expensive, this is really important for performance-sensitive code.

In Rust, drops are handled automatically, which is fantastic for safety and usability. I have absolutely no complaints about that—I wouldn't want it any other way. But the question is: Could we make Rust even better by making drops more explicit for the programmer when it matters, especially in performance-critical areas?

Drops are "inherited" by every struct or enum that contains a member—directly or indirectly—that implements drop, which means a single drop can have non-obvious consequences.

For example:

struct A(B, ...);
struct B(C, ...);
struct C(D, ...);

// And somewhere deep in your code base or another crate
struct D(...);

If, for some reason, D starts implementing Drop (or contains a member that implements Drop , such as Box , Vec , etc.), the compiler won't give any warning or error. But now A , B , and C also have to handle the drop of D whenever a variable of type A , B , or C becomes invalid.

Since dependencies between structs are usually more complex and less obvious than in this toy example, all structs that directly or indirectly use D are affected. This also applies to enums, Result , Option , etc. What was previously zero-cost can suddenly become costly without any clear indication.

Even though this behavior is absolutely correct and sound, and in my opinion, can't really be abstracted or optimized away, it makes me wonder if it would be helpful for Rust programmers to be able to track this kind of behavior more explicitly.

Maybe a marker trait like MayDrop or NoDrop could help track down unwanted or unexpected drops and make these situations easier to catch, especially in performance-sensitive code.

I'm curious to hear what the Rust community thinks about making drops more explicit in these contexts.

Thanks again for your input, and I'm looking forward to hearing more thoughts on this!

1 Like

Yes, I'm quite sure. This behavior also occurs even if the size of the struct in question is actually reduced.

1 Like

NoDrop would be a backwards-compatibility hazard. Right now programmers widely assume that adding a Drop impl (explicitly or implicitly) is a backwards compatible change. You could use Copy bound as a proxy for NoDrop, but it has obvious limitations.

4 Likes

I think this would be a bad change, especially since it is hard to automatically tell when something is performance critical. In most cases it would add noise to the code and result in error messages telling you that you forgot to add an explicit drop. Not having to do so is one of the big benefits of having the borrow checker.

But there is a different way that could be used here (assuming you're using a language server):

  • For types and variable names rust-analyzer shows a hint, making the code easier to read.
  • The same could be done (opt-in?) for drop locations and other performance relevant/impacting things. That way it can easily be disabled but still quick/easy to access when needed.
  • Something similar could for example be done for optimizations like inlining (though rust-analyzer probably doesn't have enough information for that), showing when a function is or is not inlined (again, as a configurable opt-in).
  • Or in struct/enum definitions for padding bytes that are added, thus showing where you can at add more data at little to no cost.

cargo check and clippy give you information/hints when it's pretty sure something is useful, when there is a way to detect that it could be done better. But besides such hints from rust-analyzer (of which there could/should be more) there is no way to easily see such cases, especially in performance critical code, unless you already know what you're looking for (memory layout, alignment issues, allocations, deallocations/drops, ...)

2 Likes

They do, but it's not. (Related discussion.)

2 Likes

You may be right. While I think a NoDrop trait might not be a breaking change (since it could be used to make sure no drop is 'inherited', just like the Copy trait already does as stated by newpavlov) and might not really clutter the code since it would only be used when needed in struct/enum definitions, adding hints to the toolchain might actually be the better approach.

This could also help address other potential performance issues, like nested inlining. I ran into that when optimizing the interpreter — it's possible to control inlining somewhat with the #[inline(always)] annotation, but even then, if a function is recursive, the compiler has to decide how deep to inline. That led to some interesting effects where small changes in code or annotations made a big difference in performance.

Anyway, after spending a few afternoons scratching my head, I finally ran a real-world benchmark yesterday (quicksort of an array) on the interpreter. And guess what? Even though the interpreter is written in plain safe Rust without any fancy optimizations on the AST, it ended up being about as fast as Lua 5.3.

I was really surprised that it ended up that good, because Lua is written in plain C and one of the fastest scripting languages I know. I expected the additional safety of Rust to add at least a small penalty.

It would be nice if it was possible to "for optimisation only" traits. They could not be used to prevent code from working, just might work slower if you don't implement them.

How? I think it might need specialisation to be useful (so that's a pipe dream). And I don't see a way to prevent you from just statically asserting false in one variant of specialisation, which would make it a semver hazard again :frowning:

How would a "for optimization only" trait ever be a SemVer hazard? The definition of optimization that I work with is that it does not change semantics, only runtime characteristics, which would mean it's not a SemVer hazard by definition.

It might be unpleasant if you depend on something being optimized a certain way, and it stops being optimized that way, but that's a different statement.

That was point. But I don't think it is possible to make a optimisation only trait that is available to user code. (I'm not a fan of things that only the standard library can do.)

Assuming that this works by a trait that can only be used in specialisation, not as a normal trait bound (because this is thr only approach I can come up with), what is to prevent someone from specialising on this trait in a way that breaks semver? In particular you could write code in the specialised implementation that statically asserts on false with a "not implemented" message.

I'm not entirely sure if I'm getting your post right, but here’s what I had in mind.

I was thinking about introducing a NoDrop trait. It would be an automatic marker trait that the compiler assigns to any type that doesn’t implement Drop.

Something like this:

auto trait NoDrop {}

// Automatically generated by the compiler
impl NoDrop for i32 {}
// ...

// Box and other types with `Drop` wouldn’t implement NoDrop
// impl<T> !NoDrop for Box<T> {}

struct D(i32); // Compiler automatically implements NoDrop
struct C(D);   // Compiler automatically implements NoDrop
struct B(C);   // Compiler automatically implements NoDrop

#[derive(NoDrop)] // Explicitly require NoDrop, satisfied by the compiler
struct A(B);

Now, if D changes to:

struct D(Box<i32>);

D, C, B, and A would no longer implement NoDrop. Nothing happens with C and B since they’re not marked, but A would throw an error, and you'd get a heads-up about a potential problem.

To me, this doesn’t really mess with the existing language semantics. It’s more of a tool that lets you explicitly mark types where you don’t expect any Drop behavior.

In a way, Copy already does this — when a type is Copy, it can’t contain anything that implements Drop. So this concept is already kind of baked into the compiler.

That said, Copy has its downsides: you have to explicitly implement it for each type, it requires Clone, and it changes move semantics, which can be undesirable for larger data structures.

I think this could be a handy addition to help spot performance issues from implicit drop checks more easily, without going against the language's core principles or adding any breaking changes.

Not sure if I’m overlooking something here, so I’d love to hear your thoughts!

I see what you're getting at, but that's a general problem with introducing a new specialization; any new specialization is a SemVer hazard, no matter what trait you specialize over. Optimization-only traits don't change this in any meaningful fashion - for example, I can already add a specialization for Copy types that fails where my specialization for Clone types does not.

Fair enough. The root issue is that Drop is not really a breaking change today, but it would become a breaking change with this. (There was an example above how it is technically already a breaking change but I feel like it is going to be way more prevalent if you can have bounds on it.)

And the whole point of optimisation-only traits would be that they are not semver hazards (apart from possibly worse performance, which might be a deal breaker for you sure, but it won't prevent the code from compiling), so if there is no way to make that work, then the whole point is moot.

1 Like

Reading your proposal, this sounds like it's closely related to negative trait impls; what do you see as the difference between impl !Drop for A and #[derive(NoDrop)]?

I note that Tracking issue for negative impls · Issue #68318 · rust-lang/rust · GitHub says that there's plans to forbid impl !Drop, which is relevant to your interests, and might need more discussion; is there anything beyond that where impl NoDrop is better than impl !Drop?

I think that if NoDrop was an explicit annotation which required all members to implement NoDrop, it wouldn't be a semver hazard (just like Copy). In fact, it could be a Supertrait of Copy.

1 Like