Supporting 'janitorial' style RAII

That's getting closer. I was just thinking of two issues, and sorry for not pushing these up front, I've used this stuff for so long that I forget some of reasoning behind it sometimes...

One is that there are those janitors that just save and restore a value, and there are those that make calls to change and restore the state of the target. Obviously it's fine to do dedicated janitor types for the latter scheme that's not a problem, and I do the same in C++. I guess in that scheme, the lambda is set to call the cleanup method, but I'm not up enough on Rust macros to know if that would complicate the wrapperage since now there's no value to restore, there's a method to call.

The other is that, for the value swaps, it may need to be atomic in some cases. So that might add some complications (though I'm not up enough yet on Rust threadiness to know exactly what.)

Another is the some folks are just creating something in main and applying this sanitation functionality onto that from the outside, when really it has to be applied from within a method call to HeavyStruct itself to really be representative (without preventing further mutable access to self of course), which is more sort of where the borrowing complications seem to arise. Maybe there's no difference on that front with your scenario, it would take me a while to understand it well enough.

So, just for comparison, here's my C++ implementation of a boolean flag janitor. It doesn't care really where the boolean is from (a local, a member, whatever). I could have done a template for all of the fundamental types, but I avoid templates where they aren't necessary.

One very useful thing it does is support releasing the janitor, so it does nothing on destruction. It also deals with the incoming target being null, which is very useful since that allows you to not have to do some otherwise really annoying stuff to deal with the target being null. This way it just does nothing.

class TBoolJanitor
{
    public  :
        TBoolJanitor(       tCIDLib::TBoolean* const pbToRestore
                    , const tCIDLib::TBoolean        bNewValue) :

            m_bOldValue(kCIDLib::False)
            , m_pbToSanitize(pbToRestore)
        {
            if (m_pbToSanitize)
            {
                m_bOldValue = *m_pbToSanitize;
                *m_pbToSanitize = bNewValue;
            }
        }

        ~TBoolJanitor()
        {
            if (m_pbToSanitize)
                *m_pbToSanitize = m_bOldValue;
        }

        tCIDLib::TVoid Release()
        {
            m_pbToSanitize = nullptr;
        }
        tCIDLib::TBoolean   m_bOldValue;
        tCIDLib::TBoolean*  m_pbToSanitize;
};

Here's a typical 'call a method' type, which in this case takes a window object and suppressing painting on it until it's destructed.

class TWndPaintJanitor
{
    public :
        TWndPaintJanitor(TWindow* const pwndTar) : m_pwndTar(pwndTar)
        {
            if (pwndTar)
                pwndTar->SuppressPaint(kCIDLib::True);
        }

        ~TWndPaintJanitor()
        {
            if (m_pwndTar)
                m_pwndTar->SuppressPaint(kCIDLib::False);
        }

    private :
        TWindow*    m_pwndTar;
};

If you have &mut, there is only one viewer of the target state, so there cannot be a need of the swap to be atomic, because nobody else can see the swap happening anyway.

You cannot do this, because this is unsound. You have to access the "reset on end of scope"d object from the wrapper. This kind of subtle problems with your design is the reason why Rust doesn't like it and forces you to use a different design. Here's the proof upthread, but to summarize:

  • If you can still access the original binding, you still have a longer lifetime
  • And that means you can spawn a thread that respects that longer lifetime and holds the &mut borrow and uses it
  • And that means that your reset on drop aliases the still running other thread's borrow.

This is super important. Your C++ design is never provably sound, so it cannot be expressed in safe Rust. "Only accesses the pointer at drop time" is not actually enough to make the pattern sound in the face of (scoped) concurrency.

Phrased in Rust terms, that means you cannot have a borrow lifetime that is valid across the point at which you do the mutation. Plus, "just make the compiler smarter" isn't a viable solution in this case, because borrowing rules are already the most complicated thing in Rust that many developers struggle with, so making it more complicated will just make the problem worse. On top of that, we barely have a formal understanding of why Rust's lifetime rules are sound, let alone any proposed extensions to the rules. And finally, we have a way of telling the compiler that we know what we're doing: using raw pointers. You just have to only use raw pointers, so that the useful qualities of references still stay true.

Thankfully, that's not a worry in Rust, because there isn't a concept of a null missing value.

That's not too difficult to support properly on top of my TemporarilySet:

impl<'a, Type, Field: 'a, Accessor> TemporarilySet<'a, Type, Field, Accessor>
where
    Accessor: for<'b> FnMut(&'b mut Type) -> &mut Field,
{
    fn forget(self) {
        let mut this = std::mem::ManuallyDrop::new(self);
        unsafe {
            std::ptr::drop_in_place(&mut this.accessor);
            std::ptr::drop_in_place(&mut this.reset_to);
        }
    }
}

The only reason this isn't just a std::mem::forget is to make sure that we clean up our members properly. scopeguard also offers a defusing into_inner function for their guard.

This is where scopeguard is typically used.

scopeguard::defer! { pwndTar.SuppressPaint(false) }
pwndTar.SuppressPaint(true);

This is a lot clearer locally I'd say than having to pull up the docs of TWndPaintJanitor to know what state exactly it manages.


I made a different version that's more evil, hiding the wrapper entirely.

#[derive(Debug, Default)]
struct HeavyStruct {
    is_critical_section: bool,
    // other fields
}

fn main() {
    let mut resource = HeavyStruct::default();
    dbg!(&resource);
    {
        temporarily_set!(&mut resource, resource.is_critical_section, true);
        dbg!(&resource); // note: this is actually a different resource binding
    }
    dbg!(&resource);
}
5 Likes

So, if there's no way to do that, how do you deal with this situation:

  1. You enter a scope and make a change that has to be undone before you exit
  2. There are ten or twenty or more places in that scope that are using ? to implicitly return errors to underlying calls.

Do you get rid of all of those ? and do match on every one of those so that you can undo that change? For one that's a huge amount of tedium. And it's guaranteed errors over time that someone misses undoing something or uses ? by accident, who would notice that in a fairly complicated chunk of code with a lot of changes?

Do you factor out every such scope to a separate method and pass along all of the state it might need from the calling method and return it back again? That's also pretty bad.

It just seems to me that Rust, being so oriented towards not letting you make mistakes, would have a way to help us avoid one of the easiest mistakes to make.

I'm sure someone will say that no one ever needs to modify the called object in such a way that it has to be undone before return, but I don't know what planet they are living on. It's pretty common.

On the threading thing in C++, it really doesn't have much in the ways of practical issues in C++. The thing the janitor took a pointer to isn't going to go away since it's either a local or a member of the class that was just called. The janitor's scope is inside that of the thing being modified by definition. And if the method is legally callable from multiple threads it's going to be locking for exclusive access to the called object, else you are already screwed in C++ (unless it's a trivial return of a fundamental member or some such.)

Obviously there's nothing in C++ that enforces legally callable from multiple threads. But if you do call something that way that's not explicitly indicated MT safe you are in UB territory no matter what

You just use this instead of self for every instance of self in the critical area.

impl MyStruct {
    fn do_stuff(&mut self) {
        let this = self; // whoops my macro doesn't work with self
        let this = temporarily_set!(this, this.whatever.needs.to.reset, get_value());
        // perfectly fine:
        this.do_more_stuff();
        // only this isn't possible:
        self.do_more_stuff();
    }

    // or with scopeguard
    fn do_more_stuff(&mut self) {
        let this = scopeguard::guard(self, |this| this.finish_render_task());
        this.start_render_task();
        this.do_even_more_stuff();
    }
}

It is not as much of a burden as you seem to make it out to proxy any further calls through the proxy. And in fact, this exact thing is Rust preventing you from making an error: if it let you write this as you've written it in C++, it would either be immediate UB in Rust (because &mut is a unique reference, always) or in the best case a lingering footgun that means using other safe APIs would suddenly be able to cause UB.

Unfortunately, that's not good enough for Rust. For it to be safe, it needs to be impossible to misuse in safe code.


Ultimately, this all comes off as "I do it this way in my replacement for the C++ standard library, CIDLib, so I should be able to do it this way in Rust as well," and that just isn't and doesn't need to be true. At this point you have all the information available, as well as a proof of why your desired syntax isn't possible to make sound, so I'm not going to be able to provide any more evidence to convince you one way or the other.

Rust is harder to use than C++, nobody is going to argue that (successfully). But the tradeoff is that the hardness is Rust preventing you from sloppy code that while it might just work in C++, will potentially cause UB if misused (due to ignorance or just a bad day). In a way, Rust moves (some of) the hardness of C++ debugging up to design/compile time.

I'm not going to participate further in this thread, as there's nothing more for me to say.

9 Likes

I think we are talking past each other. I could live with the proxy if I absolutely had to, though I think it's not very self-documenting at all. But you seemed to be saying it wasn't possible, and sometimes I'm not sure what you are saying is impossible exactly.

And I get that it's harder because Rust tries to make things safer. But bugs are bugs, and a lot of bugs can happen if you don't correctly undo things if something goes wrong. I'm not trying to say make it just like C++, I'm saying it's something that a language so anal about preventing errors should address directly, however that is done. None of the above is really addressing it directly.

Anyway, approaching this from a completely different angle, how about something even more generic, that would provide the same effective capability, plus presumably other things, and wouldn't have the mutability issue. So some language support capability for this (or maybe there's something that already effectively does this):

pub fn some_method(&mut self)
{
     let x = yamama();

     // Some other stuff (not a function, it's a scope, with attached fn)
     scoped_fn(|?| println("{}, {}}. self.something, x)
     {
           // do other stuff here
     }
}

Where the passed 'closure or function' is guaranteed to be called on exit of the scope, can access anything in the scopes above it (implicitly captured), and it gets the same self as the enclosing method. The values it gets aren't given to it at the start of the scope, they are given to it at the end, where it is invoked, so it it would be no different from you just calling it at the end.

That should be compile time checkable, and you can store away anything you want before you enter the scope, and therefore it's available to you at the end of the scope just by it's normal name, no need to capture a bunch of stuff that may never be used. If it's invoked it's like a closure with implicit capture of everything it has legal access to, and the accessibility of those things are no different than they would normally be, so it's all checkable.

Maybe we could discuss this further in a more productive way if we consider an acutal concrete example instead of the foo-bar stuff where one person sees totally differently requirements than the next one. I'd be curious about seeing one or more common/ideomatic practical concrete cases where you need a janitor in C++. Excuse me if there already was any concrete example in this thread I've skipped reading everything in detail, in this case just point to it ^^

4 Likes

I went through my C++ code and here are all the things I used janitors to do. There are probably some I missed,. The ultimate point here is that, all of these are things that are done, and MUST be undone before exit. These are always acting on a local variable or a member of the called class within a method call.

In a world with manual error returns and no janitorial type concept, that means that (unless I'm seriously missing something) I cannot use ? type returns because I have to check every call's return and undo the thing manually and then return if there's an error. That is both very tedious and guaranteed to introduce errors over time.

Obviously a few of these are provided via special means in Rust and at least two aren't relevant (well they may still be in unsafe code.)

  • Set and restore the active state of a speech recognition grammar
    
  • Set a new and restore original value of a bool, number or string
    
  • Insure that an allocated single whatever gets deleted if not stored away
    
  • Insure that an allocated raw array gets deleted if not stored away
    
  • Invoke 'block mode' on a collection (batches pub/sub reports on changes to a pub enabled collection)
    
  • Lock and release a critical section, mutex, or semaphore
    
  • Cleans up a data source object (going through clean shutdown to remote and then freeing)
    
  • Lock the environment so that multiple changes can be made atomically
    
  • Change current directory and restore back to original
    
  • Push a value onto a stack collection and pop it back off
    
  • Increment and then decrement a safe counter
    
  • Enable signal handling within a scope
    
  • Get an object from a pool and return it to the pool
    
  • Set the formatting options on a text output stream and restore them
    
  • Set indentation on a text output stream and restore it
    
  • Change thread priority (not much used, but available if needed)
    
  • Sync with another thread and release it
    
  • Batch up changes (or discard them if aborted) to an undo stack
    
  • Insures a socket stream does a clean shutdown and then deletes it
    
  • Set and then release pens, fonts, regions, brushes, etc... on a graphical output device
    
  • Save and restore the current position on a graphical output device
    
  • Set a graphical device object over a bitmap (for drawing to the bitmap) and release it
    
  • Insure the CML (macro language) call stack gets popped back on exit from method processing
    
  • Mark certain types of objects as entered and maintain entrance count
    
  • Set and restore the pointer appearance for a window
    
  • Remove a branch from a tree collection if branch building is not finished successfully
    
  • Capture the mouse and release it
    
  • Make sure a temporary window is destroyed (window style) and then destroyed (object style)
    
  • Disable a window timer and then start it back up
    
  • Start a local window timer
    
  • Stop all timers on a window
    
  • Set and restore a window's visibility
    
  • Delay Window repaint until a batch of operations have been completed

I do not know if Rust has considered or would consider a similar feature, but would you be meaning something like Swift's defer statement? (official docs, scroll down to "Specifying Cleanup Actions")

Ah, well, ..., this list is longer than Iā€™ve asked for. Also a disclaimer, Iā€™ve myself not really used Rust in practice in an actual bigger project yet, so I wouldnā€™t really know for sure how many of these would be handled in Rust idiomatically. Some of the things on your list also are still pretty abstract and without context they seem rather pointless and it isnā€™t clear why theyā€™re actually relevant, for example

or

But there are lots of things that are concrete, too, so Iā€™ll ignore those abstract ones ^^

The main point in Rust safety actually is that APIā€™s would usually (if possible) be done in such a way that it is impossible to forget cleaning up.

I think this is one of the best examples to start getting into that idea. Try taking a look into Rustā€™s Mutex type which offers exclusive access to some shared data in a way where you cannot accidentally forget releasing the lock, perhaps similar as C++ā€™s unique_lock works, utilizing RAII.

Also your examples regarding allocation usually wouldnā€™t apply to Rust as youā€™d be using RAII interfaces like Box there, too; accidentally forgetting to free memory is hard with the standard library. Even further, manually de-allocating things is actually not really possible in Rust without going into unsafe Rust since doing that twice would be a double-free that safe Rust guarantees never happens.

I wonā€™t say thereā€™s never the need for something like janitory, after all someone created the scopeguard crate for a reason, but I guess the general gist of how Rust tries to make things safer is by avoiding the need of manual cleanup all-together in many places. (After all, using janitors could easily be forgotten, too.)

Also letā€™s not turn this thread into a Rust language tutorial xD

You can do this by creating and then immediately invoking a closure:

pub fn some_method(&mut self)
{
    let x = yamama();
    let result = (|| {
        // do other stuff here
    })();
    println!("{}, {}", self.something, x);
    result
}

That way, any use of ? or return within the closure will only leave that closure, and the println will still be executed. This avoids the need to rebind self to another variable.

If you don't like that the cleanup code is written after the main body rather than before it, you could write a macro to flip the order:

macro_rules! cleanup {
    ($cleanup:block, $main:block) => {
        let result = || $main;
        $cleanup;
        result
    }
}

cleanup!({ println!("{} {}", self.something, x); }, {
    // do other stuff here
});

On nightly there's also the option of using try blocks, but those are probably worse for your use case, since they only affect where ? jumps to, not return.

2 Likes

This is exactly what scopeguard does. Note that the try operator isn't all that special, it basically desugars to a match and conditional return. So scopeguard (and any other RAII structure) will be triggered even if you return with the ? operator. The only way that a scopeguard won't be triggered is if you leak it. But that's the same as any RAII construct. If this was not the case, it would as you noted, make the ? operator pretty useless.

Not all bugs are created equal, the bugs that Rust the language concerns itself with are all undefined behavior. Undefined behavior attacks our ability to find other bugs, and so it makes it that much harder fix bugs1. While C++ is laissez faire about undefined behavior ("just don't introduce UB"), Rust actively tries to prevent UB with safe defaults. This inevitably means that some patterns that work well in C++ just don't translate well to Rust if at all. Yes, this makes C++ more convenient to use, but Rust trades convenience for safety.

Note: as @steffahn has pointed out, many of the concrete things on your list are already handled through RAII structures. For example allocations (Box) and releasing locks (MutexGuard). The other things can all be handled by scopeguard, yes you will have to work through a proxy but once you do you don't have to worry about subtle lifetime issues that could come up when you decide to add threads or some other new feature.

Borrowing from @steffahn's earlier example,

let's say you have a the following code:

// does not compile, but shows a point

{
    assert_eq!(some_bool, true);
    
    spawn_scoped_thread(|| { some_bool = false; });

    let janitor = Janitor::reset_bool(&mut some_bool);

    // do some work

} // scoped thread should be joined here,
  // and the janitor will also be cleaned up here

assert_eq!(some_bool, ???);

What happens at the end of the scope? (if this compiled)

  1. some_bool is false
  2. some_bool is true
  3. undefined behavior
answer there is undefined behavior because of a race condition, `some_bool = false` races with the reset from the janitor. If this was C++, your code would likely appear to work fine for a while until it doesn't. With Rust, it detects this at compile time and so you never need to worry about it once deployed.
4 Likes

Two quick points to follow up on my last post:

  • The approach I suggested has the downside that it will not execute the cleanup when unwinding from a panic within the main body. Personally, I usually prefer to just turn on panic=abort mode and not deal with panic unwinding, but if you do need panic unwinding, it's an issue to watch out for. There are ways to fix that, but...

  • ...the RAII-based approaches others have proposed work with panics 'out of the box', and I don't actually understand what you find unsatisfactory about them. I only mentioned the closure approach because it's simple to understand and closely matches what you asked for, with the main body inside an indented block and the cleanup code outside.

    But RAII has the advantage that you don't need the indented block, and it's more directly analogous to the C++ behavior you're trying to replicate. As downsides go, having to write this instead of self and maybe insert some *s is aesthetically annoying, but not a big deal. Beyond that, the RAII approach should be able to accomplish all the use cases you mentioned, so what's the issue?

4 Likes

To be clear, this is what I was talking about.

1 Like

I think this is fair - scopeguard is designed for developer convenience, and tries to keep out of your way. That makes it less clear that the methods are being proxied through, and you could always use a version that made you explicitly call use_inner to get the mutable reference.

In case it hadn't been made clear: Drop implementations are called whenever a value goes out of scope. This includes returns, ?s, and panics. scopeguard uses Drop to call your guard, and will therefore run in all of these cases.

As a higher order function, scopeguard::guard can also be specialized to a more local usecase. For example, to set and reset a value, you could use this:

fn set_and_reset<'a, Receiver, Func, Value>(
    owner: &'a mut Receiver,
    mut map: Func,
    v: Value,
) -> ScopeGuard<&'a mut Receiver, impl FnOnce(&'a mut Receiver), Always>
where
    Func: 'a + FnMut(&mut Receiver) -> &mut Value,
    Value: 'a,
{
    let initial = std::mem::replace(map(owner), v);
    guard(owner, move |this| *map(this) = initial)
}

let this = set_and_reset(this, |this| &mut this.primed, true);

// `primed` will be reset so long as you don't call `this.into_inner()`

Imo, scopeguard could do with better support for this. That return type is painful to write.

There's nothing you can do to avoid giving the guard ownership of the reference, because it is mandatory for locally guaranteeing safety. That means the call has to include

  • The value being modified
  • A mapping from that value to a reference to want to change
  • The temporary value to set it to

At this point, I don't see any value in adding a language feature to support this. All it could be doing would be hiding one of these relationships, which can already be done using a macro if you really want to:

macro_rules! set_and_reset {
    ($receiver:ident . $field:ident, $tmp:expr) => {
        let $receiver = $crate::set_and_reset($receiver, |receiver| &mut receiver.$field, $tmp);
    }
}

set_and_reset!(this.primed, true);

Keep in mind though - so long as you give the destructor to a consumer of your API, they may forget it. This means this pattern could never be used to uphold invariants as a part of safety. I'd say this is why closures are considered more idiomatic: they allow a library to keep responsibility for maintaining their invariants.

9 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.