Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲The Slotted Counter Pattern (2020) (planetscale.com)

49 points by eatonphil 157 days ago | 21 comments

oftenwrong 157 days ago [-]

This is similar to how java.util.concurrent.atomic.LongAdder works:

https://github.com/openjdk/jdk/blob/master/src/java.base/sha...

o11c 157 days ago [-]

Huh, that's ... still quite inefficient. Since a multiple variables in a cache line is fine if all will be accessed from the same cpu, you ideally want a separate allocator for that, then you can avoid all the spooky dynamic growth. And most CPUs offer a direct "atomic add" instruction which is much faster than a CAS loop. For pure stat counters you generally want `relaxed` memory ordering on that; for other cases acquire and/or release may be desired (this is tricky to get performance-optimal given that some platforms upgrade to a stronger memory ordering for free so it's best to require it in your algorithm, whereas others are best with an explicit fence in the edge case). I've never found a real-world use for `seq_cst`.

It's unfortunate that per-cpu variable are difficult in userland, but there are at least 2 ways to fully emulate them - rseq and pinning - and you can also just revert to full-blown thread-locals (which have much better tooling support) if you aren't constructing a gratuitously large number of threads, or shared thread-locals if you do have a lot of threads. If you make the wrong choice here, correctness never suffers, only performance.

BeeOnRope 157 days ago [-]

Uncontended CAS without carried dependendies on the result (almost always the case in this use case) are similar in performace to atomic add on most platforms.

The CAS is the price they pay for contention detection, though it would be interesting to consider solutions which usually use unconditional atomics with only the occasional CAS in order to check contention, or which relied on some other contention detection approach (e.g., doing a second read to detect when the value incremented by more than your own increment).

The solution looks reasonable to me given the constraints.

o11c 157 days ago [-]

Part of my point was that "check for contention" is often a silly thing to do in the first place, when you can just avoid it entirely (and with simpler code too).

Admittedly, Java is fundamentally incapable of half of the solutions, but making a simple bump allocator (called once per statistic at startup) over per-thread arrays is still possible.

BeeOnRope 157 days ago [-]

Yeah exactly, and this is a commonly used trick in concurrent data structures in general. The Java implemenation has the additional twist that they don't use a fixed number of "slots" but rather start at 1 and use CAS failures as a mechanism to detect contention and then grow the number of slots until there are no longer CAS failures.

hassleblad23 157 days ago [-]

Are there are any other approaches for this problem? Seems like there should be.

Squid_Tamer 157 days ago [-]

I think probabilistic counters are really interesting for this situation, as long as you don't need exact numbers.

  if random.randint(1, 100) == 1:
      db.increment(row)

Then, looking at the database, you can know that the actual count is approximately 100x the number of times the row was incremented.

There's an even more aggressive version that I've seen called a 'Morris Counter' which uses just a few bits to build an order-of-magnitude estimate of the number of events: https://en.wikipedia.org/wiki/Approximate_counting_algorithm

immibis 157 days ago [-]

Yes: use a different tool, such as Redis.

Updating a SQL database row can be expensive because of the transaction overhead. Once you increment this counter, the row with the counter remains locked until the transaction is committed. But the actual useful work of incrementing a number is already quicker than selecting a random slot. If the increment can be a simple ADD instruction, without transactional semantics, then this random-slot-based approach is slower than just incrementing it. Especially if you don't care about missing a few increments and therefore don't need a mutex.

For even more scale, don't use Redis but have a shared memory block containing your integer.

For even more scale, you can have one such slot per CPU. Make sure they're in different cache lines, to avoid false sharing. Now you have slots again, but they're per-CPU and so you still avoid all the SQL database overhead.

Performance is all about structuring the task as close to the machine as possible.

jeffrallen 157 days ago [-]

Some people, when they have the problem of incrementing integers think, "I'll use Redis."

Now they have two problems.

esafak 157 days ago [-]

Can you go into more detail? Redis supports counters: https://redis.io/glossary/storing-counters-in-redis/

jeffrallen 156 days ago [-]

https://regex.info/blog/2006-09-15/247

immibis 155 days ago [-]

so... what's the problem with using Redis to store a shared counter?

jeffrallen 155 days ago [-]

It's a joke about answering complexity with additional complexity. Laugh, it's ok. :)

immibis 154 days ago [-]

Redis is a data structure server, for a limited set of data structures. If that's what you want, then use Redis.

coolhand2120 157 days ago [-]

You could add a row for each count. You could add more associated information with the count too, like date, why, etc.. Then that insert would trigger a debounced method that would query the event stack to give you a count. You can make periodic snapshots of the total to avoid O(n) table scans. This allows scale without row locking and a much more accurate accounting of when the counts happen. You rewind/fast forward, get a time series, whatever with the event stack. This is all part of the CQRS pattern.

https://martinfowler.com/bliki/CQRS.html

aqueueaqueue 157 days ago [-]

Outside of the DB: some kind of queue.

Could be a SQS or an in memory queue in your app (hand waves about mutiple nodes but that can be dealt with or use redis). Have a single processor that aggregates and does writes. You can use DLQ if something goes wrong to avoid data loss.

Inside the DB maybe just append rows for each add to a table without indexes. That way the engine probably just needs to write out a journal record and append but doesn't spend time on updating indexes. Then periodically read all from the table and aggregate into another. You could cycle between 2 tables.

These is eventually consistent in aggregations but transactional in source of truth. I.e. if you write you can be sure the data is stored, but may take time to aggregate.

Or Use a more exotic DB. That is distributed but still supports transactions.

smlacy 157 days ago [-]

Usually for this kind of use case you want both INCREMENT and DECREMENT. Using a random number for the "Slot" (AKA "Shard") seems like an anti-pattern. If the Slot were derived from a hash of the data itself, then DECREMENT would be much more straightforward.

roywiggins 157 days ago [-]

Hm, why wouldn't decrements work with random slots, if you're always just summing up all the slots to get the final tally?

whirlwin 157 days ago [-]

It's an interesting approach. In similar cases has anyone got experience using a database more optimized on writes, e.g. Cassandra?

eatenfill 157 days ago [-]

phillip it's called lock striping

smlacy 157 days ago [-]

Really just a special case of sharding.

Rendered at 00:38:13 GMT+0000 (Coordinated Universal Time) with Vercel.