Memory management concepts


#1

Hello, everyone!

For sure, Rust have the most innovative and one of the most interesting memory management designs out there. Having full memory safety without any runtime overhead at all (e.g. garbage collection or reference counting) is an enormous achievement, but, still, I am wondering if there are any other innovative solutions to be discovered?

I ended up on this forum because I wasn’t able to think of a place with bigger concentration of similar-minded people. It would be cool if you, guys, have any innovative ideas and concepts, so we can start a little discussion!

As for now, these are the interesting ideas I have encountered:

  • Region based memory management Quite old, but still interesting idea, which was already implemented and tested several times in different programming languages. It appears not to have any memory safety guarantees, as some forms of automatic region-based allocation/deallocation of memory can suffer a lot from iteration and loops. Also, requires nearly the same amount of effort to maintain as manual memory management, except guarantees deallocation at some point (?).
  • Reference counting with eventual cycle collection It sounds like quite a nice idea from both performance and memory preservation standpoints. However, it ruins the idea of leveraging compile-time information for smart, statically-analyzed memory model. Also it introduces some overhead. It is interesting how many interesting ideas there is posted in form of white-papers, even dating back to 1982, but nearly all of them require eventual garbage collection to be totally safe.
  • Rust’s concept I am definitely not the person to evaluate and discuss Rust’s genius memory management concept. Nevertheless, in my opinion, if such a memory model is possible, there must be some other yet undiscovered concepts which are, maybe, less safe, but also more expressive and easy to use for beginners.

I also tried to make up my own concept of compile-time data based memory system, but have gradually reinvented the same ideas rust have already introduced, so, no fresh discoveries here.

Thanks for your attention! Hope to hear your thoughts and ideas.


#2

I’ll throw in a couple of interesting papers on the topic :slight_smile:


#3
  1. https://en.wikipedia.org/wiki/Cyclone_(programming_language)

  2. http://blog.pnkfx.org/blog/2016/01/01/gc-and-rust-part-2-roots-of-the-problem/


#4

Yeah, I know about Cyclone, it was one of the reasons I included region-based memory allocation / deallocation to the list. Maybe there are some subtleties I haven’t noticed?


#5

I’m really interested in this, particularly in non-GC/no-overhead ways to manage memory. One approach I’m interested in exploring is per-task pools with different allocation policies. I’m thinking along the lines of how Erlang treats processes - for a short-lived task, you can have an allocation policy of pFree += sizeof(object) and just leak the memory, and the pool can be reused by other tasks later.

This approach does require some type of coarse-grained region tracking, manual memory management for long-running tasks (probably via RAII and/or refcounting), and pool GC, but -

  • The common case is easy and fast.
  • The GC only operates on unclaimed pools so it doesn’t interfere with any other threads.
  • The worst case is no worse than C++.

The idea is inspired by the way operating systems deal with process memory and resources, and it’s a very early-stage incomplete idea, but I do want to explore it further. I’m interested in doing audio processing with this, so low-level, unsafe, statically typed.

I think I remember PHP or HHVM using a similar strategy per-request, but that’s a very specific application of this.


#6

Thanks for your response :slight_smile:. It’s indeed quite an interesting idea. Still, there are some potential problems I can think of. First of all, any possibility of leaking memory feels kind of incomplete, when you don’t put it in the realm of “unsafe”, as, for example, Rust does. Also it can repel newcomers and/or enterprise people, who want “safety guarantees”, when their software is written by 100 isolated programmers. Secondly, the main reason I don’t want GC is easy interoperability with some low-level libraries that already exist (e.g. ones written in C, for C). Any form of garbage collection can potentially mess things up?

All of these points are quite vague, as far as I can see, so I am really interested in your opinion.


#7

I’m with you that it’d be nice to not need any type of GC background process, but aside from Rust’s model I just can’t think of a way to do that. This was the best I could do at minimizing it, and working with Unity where you have to actively avoid the GC, at least this way it doesn’t get in the way of doing real-time work. To actually get away from it, you’d need to either do some self-management of pools or not allow tasks to create new ones, and then you have to be really careful to not run out of memory.

As far as your concern about leaking being bad, I agree that it’d probably put some people off, although I don’t think that’s necessarily rational - real GC languages do that too, it’s just you have more of a guarantee that memory will be available when you need it. I think the biggest issue here is you’d have to be really aware of how much memory you’re going to need for each task, otherwise you could easily run out if you have a few concurrent tasks that take more than you expected. You’d have to make sure that if your usage was unbounded you use a real allocator and RAII. It also would make allocation really tedious, because you have to either handle the error case everywhere (imagine if Box::new returned an Option in Rust) or accept that tasks crash like Erlang, which needs a runtime. I don’t really like either of those.

I’m thinking though, it might work to not encourage leaking but just say don’t worry about it too much? So you’d use RAII objects, but in the case where you have cycles or shared references, you just forget about it and they’ll be reclaimed when the task finishes. I think you could even get by without a runtime if you have language support for tasks as regions and don’t build in green threads like Go. The pool manager would post notifications about memory usage to a user defined callback, which you could either handle yourself or use the default background thread handler.

Not sure how threading would work yet, since that would not be a subset of the parent task unless you had something like rayon::Scope. I guess you’d need some type of move/copy semantics between regions, and probably refcounted pointers.

Hmm, FFI is gonna be tricky. I guess you’re kind of on your own there. I think maybe to make it safer you’d restrict it to memory that’s not part of an auto-freed pool, although I think it should be up to the user to decide whether that particular call holds on to the memory or not.

I think if I go too much further I’m just going to end up confusing myself, but go ahead and nitpick, I need it if I’m gonna have any chance of making this work.


#8

To be honest, after reading your post I am starting to think, that basic per-task refcounting should do for some time. Still, I am quite interested in your idea and it would be better for both of us to see a proof of concept :slight_smile:. However, I don’t really like the fact, that we are not using any compile-time information to help programmer think less about tedious stuff. As for now, I haven’t really thought through, I only theorize that it’s possible. Also, what about forcing using lightweight cooperative threads everywhere, where you want to “allocate” a new “pool” of memory? Making programmers architect their programs around coroutines, if they are writing something more complex.

For example:

struct Test:
    Test other // some kind of self-referencing struct
    int value
end
fun foo():
    fib add_fib: // some form of fiber (coroutine) initialization
        let test1 = new Test
        let test2 = new Test
        test1.other = test2
        test2.other = test1
        // refcounted objects are created
    end
    add_fib.run()
    // fiber is run, resources are allocated
    // in it's local "context"
    // when scope closes, resources are not deleted
    // because of ref cycle
    // but immediately all the resources are freed
    // because of the fiber finishing it's work
end

With this example, you can always adapt language’s syntax to make all this coroutine stuff less tedious. However, it feels like some form of hack for me, as nearly the same can be achieved with usual closures. Also, what about sharing resources between coroutines? At this point we might also need some kind of “move” semantics and ownership, so eventually we end up with concept similar to rust.


#9

However, if you really split coroutines and closures, letting only coroutines have this “property” of initializing new reference contexts, you can add some kind of freedom to dealing with memory inside given language.

fib foo:
    let test1 = new Test
    fun bar():
        test1.other = new Test
        test1.other.other = test1
    end
    bar()
    // at this point cyclic references are created
    // which won't be resolved until foo
    // exits
end

However, for such solution to work we might restrict the possibility of fibers to access the outer variables and maybe provide some ways to “move” values into the fibers, letting them own an them.

let test1 = new Test
fib foo(test):
    // at this point we own "test1"
    // ... do some stuff
    // at this point "test1" is destroyed forever
end
foo.run(test1)
// cannot use "test1" anymore
// "Cannot use moved values"

I might be missing something too obvious, but as for now, it looks like something quite flexible, easy to grasp and original. Or maybe not original at all?

Edit: The main problem I can see now is some kind of mechanism for borrowing objects to fibers. It might not be necessary, but I can see lots of situations, where you would end up returning the same object you provided as an argument. (The reason Rust added borrowing in the first place)