Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Analyzing the codebase of Caffeine, a high performance caching library (adriacabeza.github.io)

268 points by synthc 150 days ago | 47 comments

jedberg 150 days ago [-]

It would be interesting to see this on reddit's workload. The entire system was designed around the cache getting a 95%+ hit rate, because basically anything on front page of the top 1000 subreddits will get the overwhelming majority of traffic, so the cache is mostly filled with that.

In other words, this solves the problem of "one hit wonders" getting out of the cache quickly, but that basically already happened with the reddit workload.

The exception to that was Google, which would scrape old pages, and which is why we shunted them to their own infrastructure and didn't cache their requests. Maybe with this algo, we wouldn't have had to do that.

masklinn 150 days ago [-]

Wouldn’t one hit wonders still be an issue? They might get evicted relatively fast anyway but assuming an LRU each will still take a cache entry until they go through the entire thing and finally get evicted.

Although if that’s your concern you can probably just add a smaller admission cache in front of the main cache, possibly with a promotion memory.

JanecekPetr 149 days ago [-]

That's kind of the idea of Caffeine, it has admission buffers, and it adapts automatically between LRU and LFU. The original algorithm is called Windiw TinyLFU (design https://github.com/ben-manes/caffeine/wiki/Design), see it in action e.g. here: https://github.com/ben-manes/caffeine/wiki/Efficiency

masklinn 149 days ago [-]

I know that, but I’m not replying to a comment about caffeine, rather the opposite.

NovaX 148 days ago [-]

I think the idea is that the cache is so large that hot data won't be forced out by one-hit wonders. In this 2017 talk [1], the speaker says that Twitter's SLA depends on having a 99.9% hit rate. It is very common to have extremely over provisioned remote caching tier for popular sites. That makes eviction not as important and reducing their operational costs comes by purging expired data more proactively. Hence, memcached switched from away from relying on its LRU to discard expired entries to using a sweeper. Caffeine's approach, a timing wheel, was considered but dormando felt it was too much of an internal change for memcached and the sweeper could serve multiple purposes.

[1] https://www.youtube.com/watch?v=kxMKnx__uso

adbachman 149 days ago [-]

what are/were Reddit's top two or three cached structures / things?

guessing post bodies and link previews feels too easy.

comment threads? post listings?

was there a lot of nesting?

it sounds like you're describing a whole post--use message, comments, and all--for presentation to a browser or crawler.

(sorry, saw the handle and have so many questions :D)

NovaX 149 days ago [-]

Doesn’t reddit use Cassandra, Solr, and Kafka which uses Caffeine?

jbellis 150 days ago [-]

Caffeine is a gem. Does what it claims, no drama, no scope creep, just works. I've used it in anger multiple times, most notably in Apache Cassandra and DataStax Astra, where it handles massive workloads invisibly, just like you'd want.

Shoutout to author Ben Manes if he sees this -- thanks for the great work!

plandis 149 days ago [-]

Plus Ben made it extremely easy to migrate from Google Guava’s cache. It’s mostly the same API and way more performant to switch to Caffeine.

NovaX 149 days ago [-]

Thanks Jonathan!

khana 149 days ago [-]

[dead]

hinkley 149 days ago [-]

Years ago I encountered a caching system that I misremembered as being a plugin for nginx and thus was never able to track down again.

It had a clever caching algorithm that favored latency over bandwidth. It weighted hit count versus size, so that given limited space, it would rather keep two small records that had more hits than a large record, so that it could serve more records from cache overall.

For some workloads the payload size is relatively proportional to the cost of the request - for the system of record. But latency and request setup costs do tend to shift that a bit.

But the bigger problem with LRU is that some workloads eventually resemble table scans, and the moment the data set no longer fits into cache, performance falls off a very tall cliff. And not just for that query but now for all subsequent ones as it causes cache misses for everyone else by evicting large quantities of recently used records. So you need to count frequency not just recency.

YZF 149 days ago [-]

For every caching algorithm you can design an adversarial workload that will perform poorly with the cache. Your choice of caching algorithm/strategy needs to match your predicted workload. As you're alluding there's also the question of which resource are you trying to optimize for, if you're trying to minimize processing time that might be a little different than optimizing for bandwidth.

hinkley 149 days ago [-]

If you have to refetch on a cache miss you're going to be doing both. But all optimizations are always playing with the trigraph of cpu time, memory, and IO (with the hidden fourth dimension of legibility), so I don't think you're saying anything that can't be assumed as given. Even among people who tend to pick incorrectly, or just lose track of when the situation has changed.

YZF 149 days ago [-]

I understood the OP to have said something along the lines of if we have a fixed cost per object then we should bias towards smaller objects if we want to minimize that cost.

And totally legibility and/or simplicity. I'll take something I can reason about and maintain over something more complicated just to eek out a tiny better hit ratio. That said, if you're caching at scale your 0.05% hit ratio can be a big deal.

As a matter of personal taste/opinion I also shy away from close loop systems. Feedback makes things complicated in non-intuitive ways. Caffeine seems neat in terms of using feedback to adjust the cache to the workload - as always test with your workloads and pick what is best for your situation.

hinkley 149 days ago [-]

The thing is that untangling the logic often reveals either new feature opportunities that would have been ugly to implement previously, or new optimization opportunities because the code is clearer, and possibly two things that seemed unrelated before are now obviously related.

If I can't figure out how to make code faster without cheesing it I'll just follow the campsite rule and hope for inspiration (the more you work with code the more you understand it, and I might as well clean while I'm here)

NovaX 149 days ago [-]

You might be interested in this thread [1] where I described an idea for how to incorporate the latency penalty into the eviction decision. A developer even hacked a prototype that showed promise. The problem is that there is not enough variety in the available trace data to be confident that a design isn't overly fit to a particular workload and doesn't generalize. As more data sets become available it will become possible to experiment with ideas and fix unexpected issues until a correct, simple, elegant design emerges.

[1] https://github.com/ben-manes/caffeine/discussions/1744

hinkley 149 days ago [-]

I'll check it out.

The thing is that the math changes when the system is saturated. The closer you get to tipping over, the more each new request costs. I feel like I can clearly recall times when I had to make a Sophie's Choice between p50 and p95 times because of issues of this sort.

thomastay 149 days ago [-]

> However, diving into a new caching approach without a deep understanding of our current system seemed premature

Love love love this - I really enjoy reading articles where people analyze existing high performance systems instead of just going for the new and shiny thing

dan-robertson 149 days ago [-]

Near the beginning, the author writes:

> Caching is all about maximizing the hit ratio

A thing I worry about a lot is discontinuities in cache behaviour (simple example: let’s say a client polls a list of entries, and downloads each entry from the list one at a time to see if it is different. Obviously this feels like a bit of a silly way for a client to behave. If you have a small lru cache (eg maybe it is partitioned such that partitions are small and all the requests from this client go to the same partition) then there is some threshold size where the client transitions from ~all requests hitting the cache to ~none hitting the cache.)

This is a bit different from some behaviours always being bad for cache (eg a search crawler fetches lots of entries once).

Am I wrong to worry about these kinds of ‘phase transitions’? Should the focus just be on optimising hit rate in the average case?

NovaX 149 days ago [-]

As the article mentions, Caffeine's approach is to monitor the workload and adapt to these phase changes. This stress test [1] demonstrates shifting back and forth between LRU and MRU request patterns, and the cache reconfiguring itself to maximize the hit rate. Unfortunately most policies are not adaptive or do it poorly.

Thankfully most workloads are a relatively consistent pattern, so it is an atypical worry. The algorithm designers usually have a target scenario, like cdn or database, so they generally skip reporting the low performing workloads. That may work for a research paper, but when providing a library we cannot know what our users workloads are nor should we expect engineers to invest in selecting the optimal algorithm. Caffeine's adaptivity removes this burden and broaden its applicability, and other language ecosystems have been slowly adopting similar ideas in their caching libraries.

[1] https://github.com/ben-manes/caffeine/wiki/Efficiency#adapti...

hinkley 149 days ago [-]

I had a team that just did not get my explanations that they had created such a scenario. I had to show them the bus sized “corner case” they had created before they agreed to a more sophisticated cache.

That project was the beginning of the end of my affection for caches. Without very careful discipline that few teams have, once they are added all organic attempts at optimization are greatly complicated. It’s global shared state with all the problems that brings. And if you use it instead of the call stack to pass arguments around (eg passing ID instead of User and making everyone look it up ten times), then your goose really is cooked.

dan-robertson 149 days ago [-]

Interesting. I hadn’t really thought of global state as being a problem (I mostly think of caches as affecting performance but not semantics but I guess I didn’t really think about cache invalidation/poisoning either). My main worry would be more something like making a cold start very difficult or making things harder to change.

hinkley 149 days ago [-]

When you design a call tree so that any data used later is passed explicitly down the call tree instead of looked up by ID over and over, then you can be sure that all of the decisions about that data are made on a consistent copy of the data.

When you look up the same value 10 times, you not only pollute the flame graphs and call counts which makes proving that a better algorithm is necessary or has any effect much harder, but more importantly, you could get 3 different states and try to make a bunch of updates based on what should be mutually exclusive states in the system. That's the global shared state problem.

When you look up a value once and remember it throughout a calculation, it may not be the current state, but at least you have a clean snapshot of the data. Which in situations such as cancelling an account immediately after getting one last discount on a purchase, well, we know which scenario the customer probably meant.

zaphirplane 149 days ago [-]

Tell me more What if other values are looked up deep into the call stack, would that cause actual inconsistency as different values were looked up at different times

hinkley 148 days ago [-]

It can but it’s very hard to catch. It’s like running your database at the wrong isolation level. By the time the bug happens it’s under heavy load and the system is too noisy to catch the real problem. So you have glitches nobody can explain and they just deal with cleanup.

For this and other reasons I think that in addition to Functional Core, Imperative Shell, you want a “square” call tree in your code. Avoid functions with no fanout, and functions with high fanout. Rearrange code that uses the same data to happen as close together as you can, to improve local reasoning. When functions get unwieldy, or deleted code makes them too small, use the b-tree algorithm as inspiration to rebalance the tree.

Refactor when new features change the coupling of the code.

t0mas88 149 days ago [-]

These are exactly the things to worry about in an application that has enough scale for it. My usual approach is to have a wiki page or document to describe these limitations and roughly the order of magnitude where you will encounter them. Then do nothing and let them be until that scale is on the horizon.

There is no point fixing a "this could be slow if we have more than 65535 users" if you currently have 100 users.

I usually add a few pointers to the document on how to increase the scaling limit a bit without major rebuilding (e.g. make this cache size 2x larger). Those are useful as a short term solution during the time needed to build the real next version.

ratorx 149 days ago [-]

Caching itself is introducing a discontinuity, because whether a request does or does not hit the cache will have vastly different performance profiles (and if not, then the cache may be a bit useless).

I think the only way to approach this problem is statistically, but average is a bad metric. I think you’d care about some high percentile instead.

quotemstr 149 days ago [-]

Huh. Their segmented LRU setup is similar to the Linux kernel's active and inactive lists for pages. Convergent evolution in action.

NovaX 149 days ago [-]

I tried to reimplement Linux’s algorithm in [1], but I cannot be sure about correctness. They adjust the fixed sizes at construction based on device’s total memory, so it varies if a phone or server. This fast trace simulation in the CI [2] may be informative (see DClock). Segmentation is very common, where algorithms differ by how they promote and how/if they adapt the sizes.

[1] https://github.com/ben-manes/caffeine/blob/master/simulator/...

[2] https://github.com/ben-manes/caffeine/actions/runs/130865965...

nighthawk454 149 days ago [-]

Seems to be hugged, so here's a cached view

https://archive.is/w8yFG

https://web.archive.org/web/20250202094451/https://adriacabe... (images are cached better here)

dstroot 149 days ago [-]

Codebase has >16k stars on GitHub and only 1 open issue, and 3 open PRs. Never seen that before on a highly used codebase. Kudos to the maintainer(s).

bean-weevil 149 days ago [-]

I went through some of the issues to see how aggressively they close them and found this gem: https://github.com/ben-manes/caffeine/issues/1824#issuecomme...

ketzo 149 days ago [-]

Damn, I need that framed over my desk.

Lord_Zero 149 days ago [-]

I haven't looked, but stalebot can make repos look squeaky clean when in reality issues are ignored and then closed without being addressed.

calpaterson 149 days ago [-]

Sparing everyone else a browse of the bugtracker: the maintainer does not seem to use a bot to autoclose issues. The close issues appeared to be actually closed and it seemed from a quick glance that he actually investigated each filing.

NovaX 149 days ago [-]

Yep, no bots. A real bug not only means that I wasted someone else’s time, but reporting is a gift for an improvement. If a misunderstanding then it’s motivation that my project is used and deserves a generous reply. This perspective and treating as strictly a hobby, rather than as a criticism or demand for work, makes OSS feel more sustainable.

FridgeSeal 149 days ago [-]

Google-Adjacent OSS repos are the worst for this!

Passive-Aggressive Stalebot marches into the middle of an _active_ thread and announces that the issue is now stale/dead and everyone should go away.

I get that on large projects there’s a non-trivial percentage of issues which amount to “I’m holding it wrong, didn’t actually read the log messages, or the manual, fix it for me pls” which are just unhelpful noise. However more often than not they take every other thread- including important ones, with them.

homebrewer 149 days ago [-]

kitty is very close, which is impressive when you remember that the vast majority of the work is done by one guy (Kovid Goyal).

https://github.com/kovidgoyal/kitty/issues — 0.239% vs 0.137%

https://github.com/kovidgoyal/kitty/issues — 0.729% vs 0.317%

https://github.com/kovidgoyal/kitty/graphs/contributors

149 days ago [-]

jupiterroom 150 days ago [-]

really random question - but what is used to create the images in this blog post? I see this style quite often but never been able to track down what is used.

itishappy 150 days ago [-]

https://excalidraw.com/

https://d2lang.com/

https://www.drawio.com/

For something a bit lower level, try:

https://roughjs.com/

It's what powers the sketch-like look from many of the sites above.

atombender 150 days ago [-]

I suspect they used Excalidraw [1]. It's a nice, quick tool for this kind of sketching, and supports collaborative drawing.

[1] https://excalidraw.com/

jupiterroom 150 days ago [-]

i.e these - https://adriacabeza.github.io/img/tinylfu.png

syct 150 days ago [-]

Try this https://excalidraw.com/

homarp 149 days ago [-]

if you look at the metadata of each png, you will find the application/vnd/excalidraw/json field that contains the image in excalidraw format.

150 days ago [-]

synthc 150 days ago [-]

Interesting deep dive on the internals of Caffeine, a widely used JVM caching library.

150 days ago [-]

theandrewbailey 150 days ago [-]

[flagged]

DonHopkins 150 days ago [-]

[flagged]

150 days ago [-]

urbandw311er 149 days ago [-]

Caffeine is also the name of a macOS utility to stop the screen going to sleep. Be great if whichever came second could consider a name change.

unification_fan 148 days ago [-]

Actually, Caffeine is the name of a hello world script I wrote back in the 80s, so...

Rendered at 03:36:10 GMT+0000 (Coordinated Universal Time) with Vercel.