Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Fundamentals of garbage collection (2023) (learn.microsoft.com)

142 points by b-man 213 days ago | 32 comments

Metalnem 210 days ago [-]

I recently re-read this article and can confirm that it's excellent—not just this specific page, but all the other sections under "Garbage Collection" as well.

If you want to dive deeper into memory performance analysis in .NET, this is another must-read: https://github.com/Maoni0/mem-doc/blob/master/doc/.NETMemory...

It was written by Maoni Stephens, the architect of .NET's garbage collection.

BlimpSpike 210 days ago [-]

Kindof unrelated to the article, but I was recently wondering if it would be possible to detect and deny pointer cycles in a language in an efficient way, so that you could then use simple reference counting instead of full-blown garbage collection.

It probably wouldn't be usable for a general-purpose programming language, but for a special-purpose scripting language I could see it making the language implementation easier.

jlouis 210 days ago [-]

You can make a programming language where cycles are impossible. Erlang is a prime example.

Region inference is another strategy in this space. It can limit the need for full-blown garbage collection in many cases, but also comes with its own set of added trade-offs.

Reference counting is just a different kind of garbage collection, really. It acts like a dual construction to a tracing GC in many cases. If you start optimizing both, you tend to converge to the same ideas over time. Refcounting isn't void of e.g. latency problems either: if I have a long linked list and snip the last pointer, then we have to collect all of that list. That's going to take O(n) time in the size of the list. For that reason, you'd have to delay collecting the large list right away, which means you are converging toward a tracing GC that can work simultaneously with the mutator. See e.g., Go's garbage collector.

zozbot234 210 days ago [-]

> latency problems either: if I have a long linked list and snip the last pointer, then we have to collect all of that list. That's going to take O(n) time in the size of the list. For that reason, you'd have to delay collecting the large list right away

These latency issues are inherent to deterministic destruction, which is an often desirable feature otherwise; they have little to do with reference counting itself. In principle, they can be addressed by "parking" objects for which delayed disposal is non-problematic onto a separate, lower-priority task.

battle-racket 210 days ago [-]

> It acts like a dual construction to a tracing GC in many cases

yeah one of the most helpful realizations I’ve read is that tracing and ref counting are essentially two formulations of the same problem - one is finding objects that are alive (by tracing), and the other is finding things that are dead (i.e. their ref counts reach zero). and of course, every object is either dead or alive!

fc417fc802 209 days ago [-]

It's a useful realization but the follow on (unfortunately rather popular) claim that this inverse relation makes them the same thing is clearly wrong. They exhibit entirely different performance characteristics in places where it matters.

creata 210 days ago [-]

One solution is to forbid recursive data types - e.g., require every struct type to only reference types that have already been defined. I can't think of any languages that do this.

Another solution is to make things immutable (like Erlang), or "as-if" immutable (like Koka), which guarantees that data can only point to things that have already been defined, preventing cycles.* Erlang uses this to simplify generational collection - because old data can't point to young data, it doesn't need a card table or anything like that.

I think it's perfectly possible to have a general purpose language without cycles: you can just use integer indices into an array instead of pointers if you want cyclic data structures. This is common in Rust, when people want to avoid the overhead of reference counting, but don't want to use unsafe code.

* A hidden assumption here is that the language is eagerly evaluated. There are languages like Haskell that have immutability and cyclic data structures.

asplake 210 days ago [-]

Even with cyclic relationships between types, immutability makes cycles within instances difficult (without laziness anyway). A syntax tree would be a good example.

xscott 210 days ago [-]

Yes, and the nice thing about doing it with immutability is you can still have recursive types to build linked lists, trees, and/or dags. From there you can build hash-array-mapped-tries, finger-trees, and so on, giving you friendly dict/list or JSON style data structures.

creata 210 days ago [-]

Yes, either is sufficient, I think.

Edit: I think the common idea with both solutions is that our objects have some weak order (the order in which their types were defined, and the time at which the object was created, respectively), and objects are only allowed to point to objects strictly less than them in this order.

daxfohl 210 days ago [-]

Couldn't you make lazily evaluated code in erlang too, even if it's not lazy by default like Haskell? You'd just need function pointers, right? Or is that not enough?

creata 209 days ago [-]

Haskell can have circular references because its laziness is implemented with thunks, which have a mutable cell in which to store the computed value so that terms don't get evaluated more than once. Here's a Haskell function that makes a circular linked list:

    -- circular linked list with one item
    repeat x = let xs = x:xs in xs

Here's a rough equivalent in JavaScript that doesn't use thunks, just functions:

    function repeat(x) {
        function xs() {
            return [x, xs]; // [first, rest]
        }
        return xs;
    }

The Haskell version has a cycle because, after evaluation, `repeat x` will be a circular linked list, but all the "lists" we create in the JavaScript code above are just the closure `xs`.

For completeness, here's a JavaScript version that uses thunks:

    class Thunk {
        constructor(f) { this.f = f; }
        get() { if (this.f) { this.v = (this.f)(); delete this.f; } return this.v; }
    }

    function repeat(x) {
        let xs = new Thunk(() => {
            return [x, xs]; // [first, rest]
        });
        return xs;
    }

If you try calling `x = repeat(1); x.get()`, you can see that we get a circular list.

Someone 210 days ago [-]

> I was recently wondering if it would be possible to detect and deny pointer cycles in a language in an efficient way

In general, I think that cannot be done, but if one restricts what programs can do, solutions exist.

A simple way to do it is by requiring all references “pointing out of” an object to be set the moment the object is created, and be immutable afterwards (that’s what Lisp cons (https://en.wikipedia.org/wiki/Cons) does. Without setf or similar, lisp code cannot create cycles)

That disallows quite a ome code that modifies structures without introducing cycles, but still allows for quite some code to work.

One could also store an ‘age’ field with each object and check, when a reference is updated in an object, that it points to an object that is older than the one being modified. That gives some more leeway, at the price of using more (a lot more, in code using small objects) memory.

Another idea is to add a bit to each object “there are no cycles containing this object”, and have the runtime clear that when it no longer can guarantee that (edit: unfortunately, maintaining that invariant can be very costly. Whenever code does foo.field = bar, with both foo and bar known to be not part of a cycle, you still have to do a search through all objects reachable from bar to check whether a cycle was created and, if so, clear that bit in all objects in the cycle(s). That makes this idea impractical)

If, as I suspect happens in programming languages which are “mostly immutable”, there are many objects for which that flag stays set, that can significantly speed up checking for the creation of cycles.

andreamonaco 210 days ago [-]

Hello, I'm writing an implementation of the Common Lisp language that uses an enhanced reference counting algorithm (that I've taken from literature) that detects and handles cycles. Performance seems okay, though I still haven't tried large programs.

https://savannah.nongnu.org/p/alisp

zozbot234 210 days ago [-]

A somewhat different approach was recently proposed here: https://news.ycombinator.com/item?id=44319427 but it seems to have non-trivial overhead. (Still very much worthwhile, given the potential advantages of deterministic cycle collection.) The paper you reference is quite a bit older so it would of course be interesting to do a proper comparison.

Findecanor 210 days ago [-]

The talk for this paper came up on YouTube just the other day: https://www.youtube.com/watch?v=GwXjydSQjD8

andreamonaco 210 days ago [-]

I'll look at that. About performance: people in practice have always favored GC, so I think there's a lot to be discovered in optimization of reference counting algorithms, including concurrent traversal (which is easier because each node has local info in the form of refcounts and flags) and maybe detection of problematic worse-case graphs

sfink 208 days ago [-]

Naive ref counting (RC) and tracing GC are very different, but they start looking more and more similar the more you optimize them. Adding cycle collection to RC means adding some tracing. Adding deferred/batched destruction to RC is similar to making a tracing GC incremental. Saturated ref counts (or otherwise avoiding updates) are similar to creating an older generation in a tracing GC. Barriers in a tracing GC (for incremental/generational/concurrent collection) are similar to the ref count updates when mutating RC objects. RC cycle collection time is heavily determined by how much of the graph is traced through from "suspected" roots, so it can be optimized by tracing known-live stuff and removing it from consideration.

But some significant performance-relevant differences remain. RC's cycle collection tends to take time proportional to the amount of dead stuff. Tracing GC tends to take time proportional to the amount of live stuff. (Both use optimizations that weaken the connection, but they still show their origins.)

kazinator 209 days ago [-]

Lexical closures easily give rise to cycles, without the program doing any imperative pointer swizzling to make a cycle explicitly.

For instance, a named lexical function can have itself in scope so that it can call itself recursively. This means that, as an object, it has a pointer to an environment, and that environment has an entry which contains that function itself: cycle.

If you deny cycles, that blows up at the starting line.

louthy 210 days ago [-]

On any one object you can just follow the references to see if you get back to the same object. Not super efficient as you’d have to do it for each reference as it is set.

But if it was a simple scripting language and you needed that constraint, it’s relativity easy to implement.

Findecanor 210 days ago [-]

That would still be tracing. The problem is that if there is a cycle, the reference count would be too high, and you'd not detect that the object should be reclaimed.

louthy 207 days ago [-]

I was replying to the OP:

> "but I was recently wondering if it would be possible to detect and deny pointer cycles in a language in an efficient way"

...not whatever issue you are worried about.

210 days ago [-]

rurban 209 days ago [-]

reference counting is never simple. It blows up the data size, and it costs twice for each setter.

n_plus_1_acc 210 days ago [-]

Like Rust if it has no Rc?

tonyedgecombe 210 days ago [-]

Rc is implemented in Rust so it would be possible to create an equivalent in your own code.

pjmlp 210 days ago [-]

On the context of .NET runtime, as missing from the title.

ozim 210 days ago [-]

Question: does anyone run "Server GC" for the ASP.NET applications?

There is bunch of people copy pasting documentation to SO "explaining" server GC. I am running bunch of .NET stuff in VMs and never set "Server GC" and never ran into issues with default but also not sure if it is worth testing out.

I guess it does not matter much if you are running in containers but I am running on VMs in IIS.

Lariscus 210 days ago [-]

Server GC is the default garbage collector for Asp.net Core.

> https://github.com/dotnet/AspNetCore.Docs/blob/main/aspnetco...

ozim 209 days ago [-]

Great one to see, definetly vote up!

diggan 210 days ago [-]

When I play around with changing various GCs for Java (via Clojure), then I always setup benchmarks measuring what kind of thing I want to improve, run all GCs via that benchmark to chose which to use for that service/project and call it a day. There is a lot of theorizing and navel-gazing around GCs it seems to me, and in the end it is the results that count so setup some way to measure, find the differences then move on from there :)

bob1029 210 days ago [-]

Server GC is a tradeoff between latency and throughput. It makes a ton of sense for a web server where a small additional overhead of a few milliseconds on some responses won't matter.

Workstation GC is what you want when latency is critical. This is what you'd use if you were developing a UI or game engine.

I've seen workstation GC stay in the microsecond region when strategically executing GC.Collect at allocation batch boundaries.

ozim 209 days ago [-]

Well great but you did not write out anything more than I could understand as a 15+ years developer of C#/.Net from whatever all those people in Stack Overflow wrote.

Do you have anything like running a business line application for couple of years on Server GC to write about?

Rendered at 06:02:31 GMT+0000 (Coordinated Universal Time) with Vercel.