[IDEA] Context Managers

Context managers are Python's way to do RAII:

with open("foo.txt", "w") as f:
    f.write("bar")

Unlike destructor-based RAII (like C++ and Rust have) where the cleanup is implicit (even if well defined) with context managers the cleanup is explicit - it happens at the end of the context block. This allows an async version where the cleanup is also asynchronous:

async with aiofiles.open("foo.txt", "w") as f:
    await f.write("bar")

I think something like this could be good for Rust - it would solve the fallible cleanup issue and the async cleanup issue. Also, context managers are good for more than just resource cleanup.

I'm not suggesting 1:1 feature parity with Python's context managers - there are many things that make sense in one language but not the other - but that's my main source of inspiration.

Syntax

To avoid introducing a new keyword, I'll be borrowing the use keyword. My precedent is C#, which uses the using keyword for both importing things from modules and for IDisposables (C#'s version of context managers - much less powerful than Python's, but the basic idea is similar)

I also think we should follow the .await example and make this postfix:

let data = File::open_cm("foo.txt").use f in {
    let mut content = String::new();
    f.read_to_string(&mut content)?;
    content
}?;

Or the async version:

use tokio::fs::File;
let data = File::open_cm("foo.txt").async use f in {
    let mut content = String::new();
    f.read_to_end(&mut content).await?;
    content
}?;

Note that:

  1. .async use (is this good syntax? I'm not sure I like it...) makes both entering and existing the context asynchronous.
  2. Using ? and .await inside the context do not need explicit support from the context manager. Unlike a function that receives a closure - a context manager does not open a new frame, and thus the ? and .await operate in the same frame as the enclosing function.
  3. The ? at the end of the context manager

open_cm here will need to return a value that implements:

pub trait ContextManager<'a, Mid> {
    type In: 'a';
    type Out;

    fn enter(&'a mut self) -> Self::In;
    fn exit(self, mid: Mid) -> Self::Out;
    fn bail(self);
}

pub trait AsyncContextManager<'a, Mid> {
    type In: 'a';
    type Out;

    async fn enter(&'a mut self) -> Self::In;
    async fn exit(self, mid: Mid) -> Self::Out;
    async fn bail(self);
}

(we may want to bikeshed the names of the types (In/Mid/Out))

The same type can implement both traits, and the one who gets used will be decided based on whether .use or .async use was used.

Semantics

The .use sytnax invokes the enter method of the context manager, which returns the In - in the example, that would be the file handle. In is allowed to mutably borrow the context manager, which mean we can guarantee the file handler does not escape the scope of the context, and also allows us to handle these very same resources in the exit/bail.

mid: Mid is the value returned by the block. In the example - that would be the content String. If the block finishes, exit is called and receives that value. exit returns Out - which is the value of the entire .use expression. In the example - that would be an io::Result<String>.

If the .use block is terminated early (with return/break/continue/?/whatever new thing gets added in the future), bail is called instead of exit. It does not receive a Mid - because the context block was early returned from and thus did not result in any value. And it does not return an Out - because the program is not going through a path that would be able to do something with that output. But in the async case - it'd still be awaited on.

Note that this means if you bail from the File::open_cm context you won't see the exception from closing the file. I think we can live with that, because:

  • The typical early exit is via ? - which means we already have an error to propagate, and it's probably the error we care about (rather than the closing error)
  • We can have a Clippy lint that warns about other kinds of early exits when the context' Out is a #[must_use].
  • We can always add a combinator method for error handling:
    let data = File::open_cm("foo.txt").on_error(|cm, err| {
        panic!("Got error when trying to close the file!");
    }).use f in {
        let mut content = String::new();
        f.read_to_string(&mut content)?;
        content
    }; // no ? here - on_error already handled it
    

Option<ContextManager>

Python has an ExitStack class that can be used to dynamically decide which and how many context managers to run. A full ExitStack implementation may better be left for third party crates, but I do think one of its usecases should be supported in the standard library: deciding whether or not to use a context manager.

In Python, it'd look like this:

with ExitStack() as stack:
    if want_to_use_context:
        stack.enter_context(my_context_manager())
    something_that_may_or_may_not_happen_inside_context()

With Rust, we should just use Option:

impl<'a, T, Mid> ContextManager<'a, Mid> for Option<T> where T: ContextManager<'a, Mid> {
    type In = Option<T::In>;
    type Out = Option<T::Out>;

    fn enter(&'a mut self) -> Self::In {
        self.as_mut()?.enter()
    }

    fn exit(self, mid: Mid) -> Self::Out {
        self?.exit(mid)
    }

    fn bail(self) {
        if let Some(inner) = self {
            self.bail()
        }
    }
}

And then we can do this:

want_to_use_context.then(|| my_context_manager()).use _ in {
    something_that_may_or_may_not_happen_inside_context()
}
1 Like

Thanks for sharing! However, please pick an appropriate topic; this does not seem to have any relation to unsafe code.

1 Like

Ooops. I've seen "language design" but the rest of the topic was cut off...

Fixed it.

What is the advantage of that over, for instance:

{
    let f = File::open(...)?;
    func(f);
} // destruction happens here
more_code();

async Drop is being worked on.

7 Likes

In simple cases nothing, although it is still being much clearer about which destructors are being relied on for their behavior. As a simple example:

{
    let x = mutex.lock().unwrap();
    do_stuff(&x);
}
// ... other code ...

has very different behavior from

let x = mutex.lock().unwrap();
do_stuff(&x);
// ... other code ...

In a perfect world there would maybe be a larger difference here. In practice I just have alarm bells for checking braces when mutexes are being used. In any case the equivalent context manager code is a little harder to mess up.

let x = mutex.lock().unwrap() in {
    do_stuff(&x);
};
// ... other code ...

In theory a context manager construct could provide extra guarantees over normal dropping, but I'm not sure that's desirable.

4 Likes

If there is an error when the you exit the context (e.g. - from closing the file) you can handle that error with a ?.

Try blocks RFC has been accepted, with nightly feature available, for a long while now.

2 Likes

try blocks are an orthogonal feature:

  • A try block create a new scope for the ? operator. A context manager lets the ? operator work on the outer scope.
  • A context manager allows the cleanup to return an error. A try scope leaves cleanup to the regular drop rules (that don't allow returning errors)

(NOT A CONTRIBUTION)

I wrote about this last year: Asynchronous clean-up

A lot of challenges make it hard to make progress on async destructors. A better way to handle async clean up in my view would be a kind of ad hoc destructor mechanism, which I called "do ... final", though defer blocks are another syntax for the same semantics. This lets you run arbitrary clean up code whenever a block exits and handles the problem of "destructors" that return a non-() type, whether it be a future or a Result or whatever.

In the longer term, this would also be important foundation for adding linear types, because linear types effectively also have a "non-()" destructor that must be called.

2 Likes

Using blocks to limit the lock scope is what the Mutex documentation currently recommends. Blocks are also used in meaningful ways in other contexts, such as #[cfg()]s inside methods or labeled breaks.

If you want to be more explicit what the scope is tracking you could write a macro, something like with!($expr as $ident do $block)

For situations where drop-based RAII isn't sufficient, we already have the common pattern of a method that takes a FnOnce()->T closure. Instead of a completely new mechanism, I'd rather see some improvements to the ergonomics there, like some way that allows a break or continue inside the closure to propagate through the method call to an enclosing loop.

Maybe something like a nonlocal closure keyword (akin to move) that makes the closure return a special type lifetime-restricted to the defining scope, which is allowed to perform loop control actions via the ? operator (or similar) from within that scope. Closure-capturing methods that transparently pass through the output would then "just work" with nonlocal closures, and we could provide map-style methods on that type for when an intermediary wants to inspect the closure's return value.

1 Like

The issues with closures is that the compiler does not know whether or not they'll run. Sometimes it makes more sense to initialize inside a block a variable declared outside of it:

let data;
File::open_cm("foo.txt").use f in {
    let mut content = String::new();
    f.read_to_string(&mut content)?;
    data = content;
}?;

use_the_data(data);

With a closure:

let data;
File::open_in("foo.txt", |f| {
    let mut content = String::new();
    f.read_to_string(&mut content)?;
    data = content;
})?;

use_the_data(data);

The compiler does not know whether or not the closure was invoked inside the higher order function, and thus cannot guarantee the safety of the use_the_data(data); after it.

1 Like

No please.

1 Like

It won't ever happen, but I'd be pretty happy if Rust got a Ruby-style blocks that execute in the outer control-flow context. You can sorta emulate it with std::ops::ControlFlow, but that requires boilerplate on both sides, and doesn't have any optimization guarantees.

2 Likes

I think something like what's proposed in the opening post is what rust would really need to be easier to read and to be more memory efficient.

Currently, external resources such as locks and files are treated like any other normal variable. They are not highlighted in any way, making it easy to overlook them. I believe this is bad, as especially the scope of acquired locks is very critical for the correctness of a program.

Further, for most variables, the time of their destruction does not affect correctness of the program. They could just be destroyed as early as possible to save memory. But right now, destructors are only called at the end of a block, which is a less memory-efficient default. Why don't just call them right after the last use of a variable? The only reason for that (from what I have picked up so far) is to allow using locks without having to explicitly drop their guard:

{
  let guard = lock.lock();
  do_stuff_while_holding_the_lock();
}

But creating special "resource blocks" as proposed in this thread would provide an alternative of above method of holding locks. It is more readable than above method, especially in cases where the block does manly more things, or possibly holds more than one lock. And it makes it natural to the reader to hold the lock/resource until exactly the end of the special resource block.

Then, there would be a new, better way of holding locks, and destructors could in future be called as early as possible.

2 Likes

It's any value with side-effectful drop glue. If the drop glue of a type is trivially known to not have any effect at all, it is "run" as soon as possible; this is what allows e.g. reference lifetimes to expire at last use instead of only at block scope (like they always did in the early versions of Rust before NLL).

But when there are effects such as the release of resources, including of allocation, we run the drop glue at the end of scope also because Rust values predictability of resource utilization. It may not be ideal for a large buffer to stay around longer than necessary, but it's probably preferable to it being dropped before sending of a response message where you care about that tail latency moreso than reclaiming that buffer a few milliseconds sooner.

And what if dropping values panics? Now, moving drop cleanup sooner introduces yet more potential points for hidden exceptional control flow, which is already footgun-y enough that people are considering making any attempt to panic unwind from drop glue into an abort.

Eager drops are a thing people do feel the desire for, and marking "meaningful" cleanup does hold water in that problem space, but the needed context goes deeper than that.

1 Like