For anyone skipping straight to the comments, just for context, this article is not about how to nest arena allocators in rust, it's moreso about imagining how the syntax ought to look once someone decides to pick up the work.
Say what you will about C++, but allocators are something it gets incredibly right. Bloomberg lead the effort to standardize std::pmr (derived from a similar implementation in their internal codebase), and the work and thought that went into that strongly shows. If you do it right, you end up with code that largely reads as normal C++ without any sacrifice in performance -- the allocation details are capable of mostly being embedded into the type system itself. I don't see that here in this article, and I think if Rust wants to beat C++ in this space it's going to need to try to do something similar.
I wish that there were more projects happening atop std::pmr. NVIDIA's cccl has an experimental memory_resource for CUDA memory (and their RMM library has a lot of nifty resource adapters), and it's cool to see how they're adopting this to heterogeneous compute, but there's nothing interesting in the open source world that I've seen that tries to build atop the learnings of mimalloc/glibc/etc. in terms of beating the STL pool resources. Probably, they exist but are just kept proprietary.
tialaramex 107 days ago [-]
> Probably, they exist but are just kept proprietary.
So your theory is that this is an excellent design, but for some reason all of the implementations are proprietary and by chance we never saw any of them in the years since.
Is the alternative hypothesis too obvious to spell out? This is a bad design, Bloomberg got the thing they wanted baked into the C++ ISO document so it's a success for that team but nothing more.
Big pieces of the PMR design rely on an old C++ fallback which isn't available in (safe) Rust. Undefined Behaviour. This simplifies implementation greatly of course, you just don't need to care about those cases at all, even if they're widespread, since you said they were "Undefined Behaviour" so it's not your fault when everything catches fire. And it looks like it simplifies end user code too, their code is often faulty of course, but it compiles and if you get lucky it doesn't blow up at runtime.
malkia 107 days ago [-]
In C++ land, I've actually rolled std::pmr with jemalloc in order to have better allocator than OS one, be clean, and avoid global hooks [1]
Two big issues were found (and some others):
- std::pmr:: introduces news types - e.g. std::pmr::string is different than std::string
- std::string becomes much more expensive, especially for small strings, as there is 8 bytes added for each such strings to keep the pmr allocator.
So we removed this code, and went back to a globally hooked allocator (mimalloc in our case) - targeting Windows mostly.
[1] - Globally hooked allocators are magic: mimalloc, tbbmalloc, google's, etc. The issue becomes apparent when you try to use now optimized code in a different host - for example you have 3D model exporter that works great in your tools, but poorly under 3DSMax or Maya where a different global allocator is used, and no longer works as expected.
samatman 108 days ago [-]
It's odd to me to call this a capability. We have a term for this already: dynamic bind, known as dynamic scope when it's the main scoping mechanism of a language.
That said, the mechanism seems like the right way for Rust to solve this problem, since ambient allocation is deeply baked into the language, and taming implicit globals with dynamic bind has a long history, and works fairly well.
cwzwarich 108 days ago [-]
According to the linked proposal, these implicit capability parameters are lexically bound, not dynamically bound.
thomasmg 107 days ago [-]
I wonder how to mix multiple allocators in a safe way. Say an arena allocator and the default one. How to prevent the non-arena object points to the arena one? (The problem is: the arena could get wiped, so this pointer would be invalid.) The post is about Rust, so I was hoping this is adressed...
I'm working on my own programming language and want to support multiple allocators. Usually languages just support one OR the other, safely.
dwattttt 107 days ago [-]
I would imagine the normal lifetime arrangements in Rust would prevent this, the same way it prevents nesting a shorter lived pointer inside a longer lived struct when they're all from the same allocator.
thomasmg 107 days ago [-]
OK, interesting! If I understand correctly, that means even within the arena, lifetime is tracked. That makes sense - it is Rust after all.
If each arena maintains a counter of live objects, then the arena can be dropped if it reaches zero.
jamesmunns 107 days ago [-]
In general in Rust, lifetimes enforce that references to a thing do not outlive the thing itself.
Even in unsafe code, it is possible to tie the lifetime of the allocations to the lifetime of the thing handing out the allocations, meaning that if you ever attempt to "escape" the scope, e.g. storing a shorter lifetime allocation in a longer lifetime allocation, that outer item can now only live as long as the shorter lifetime (even though it derives from the longer lifetime allocator). Any violation of this becomes a compile time error.
For example, within a function, you can have a Vec of references to local items, and although the Vec is an allocation, and COULD live forever/as long as necessary (allocations have 'static lifetime), that Vec MUST be dropped at or before when the references it contains would become invalid.
Gibbon1 106 days ago [-]
If you're designing a language you might be interested in this paper.
> We design a type system that tracks the underlying storage mode of values, and when a function returns a stack-allocated value, we just don’t pop the stack Instead, the stack frame is de-allocated together with a parent the next time a heap-allocated value or primitive is returned.
> Our evaluation shows that this execution model reduces heap and GC pressure and recovers spatial locality of programs improving execution time between 10% and 25% with respect to standard execution.
malkia 105 days ago [-]
Wouldn't that be expensive to use with FFI, or call the OS functions? What about interrupt handling? Didn't Go went through a similar phase with spaghetti stack, only to revert back to standard one.
Then it'll be tricky to do yet another "stack" walker to obtain performance metrics (eBPF, or say on Windows through standard ETW mechanisms)
Rendered at 16:09:31 GMT+0000 (Coordinated Universal Time) with Vercel.
Say what you will about C++, but allocators are something it gets incredibly right. Bloomberg lead the effort to standardize std::pmr (derived from a similar implementation in their internal codebase), and the work and thought that went into that strongly shows. If you do it right, you end up with code that largely reads as normal C++ without any sacrifice in performance -- the allocation details are capable of mostly being embedded into the type system itself. I don't see that here in this article, and I think if Rust wants to beat C++ in this space it's going to need to try to do something similar.
I wish that there were more projects happening atop std::pmr. NVIDIA's cccl has an experimental memory_resource for CUDA memory (and their RMM library has a lot of nifty resource adapters), and it's cool to see how they're adopting this to heterogeneous compute, but there's nothing interesting in the open source world that I've seen that tries to build atop the learnings of mimalloc/glibc/etc. in terms of beating the STL pool resources. Probably, they exist but are just kept proprietary.
So your theory is that this is an excellent design, but for some reason all of the implementations are proprietary and by chance we never saw any of them in the years since.
Is the alternative hypothesis too obvious to spell out? This is a bad design, Bloomberg got the thing they wanted baked into the C++ ISO document so it's a success for that team but nothing more.
Big pieces of the PMR design rely on an old C++ fallback which isn't available in (safe) Rust. Undefined Behaviour. This simplifies implementation greatly of course, you just don't need to care about those cases at all, even if they're widespread, since you said they were "Undefined Behaviour" so it's not your fault when everything catches fire. And it looks like it simplifies end user code too, their code is often faulty of course, but it compiles and if you get lucky it doesn't blow up at runtime.
Two big issues were found (and some others):
So we removed this code, and went back to a globally hooked allocator (mimalloc in our case) - targeting Windows mostly.[1] - Globally hooked allocators are magic: mimalloc, tbbmalloc, google's, etc. The issue becomes apparent when you try to use now optimized code in a different host - for example you have 3D model exporter that works great in your tools, but poorly under 3DSMax or Maya where a different global allocator is used, and no longer works as expected.
That said, the mechanism seems like the right way for Rust to solve this problem, since ambient allocation is deeply baked into the language, and taming implicit globals with dynamic bind has a long history, and works fairly well.
I'm working on my own programming language and want to support multiple allocators. Usually languages just support one OR the other, safely.
If each arena maintains a counter of live objects, then the arena can be dropped if it reaches zero.
Even in unsafe code, it is possible to tie the lifetime of the allocations to the lifetime of the thing handing out the allocations, meaning that if you ever attempt to "escape" the scope, e.g. storing a shorter lifetime allocation in a longer lifetime allocation, that outer item can now only live as long as the shorter lifetime (even though it derives from the longer lifetime allocator). Any violation of this becomes a compile time error.
For example, within a function, you can have a Vec of references to local items, and although the Vec is an allocation, and COULD live forever/as long as necessary (allocations have 'static lifetime), that Vec MUST be dropped at or before when the references it contains would become invalid.
https://www.cs.purdue.edu/homes/rompf/papers/xhebraj-ecoop22...
> We design a type system that tracks the underlying storage mode of values, and when a function returns a stack-allocated value, we just don’t pop the stack Instead, the stack frame is de-allocated together with a parent the next time a heap-allocated value or primitive is returned.
> Our evaluation shows that this execution model reduces heap and GC pressure and recovers spatial locality of programs improving execution time between 10% and 25% with respect to standard execution.
Then it'll be tricky to do yet another "stack" walker to obtain performance metrics (eBPF, or say on Windows through standard ETW mechanisms)