NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Fsync() after open() is an elaborate no-op (despairlabs.com)
davisp 3 days ago [-]
So there's a bit of a misunderstanding here in the chain of blog posts that I can clear up. First, from this article:

  That’s the question I’ve been mulling over for days, because
  I don’t see how this action can make any particular guarantees
  about durability, at least not in any portable way.
This part is super easy to clear up. CouchDB in no way relies on an fsync after open for any guarantee on durability. As shown in [1], CouchDB has been running an fsync on file open since extremely early in its development. However, I can easily see how just reading the Neighbourhoodie article would lead here.

The missing context is that CouchDB primarily fsync's after open because when an empty database is created, we write a header to disk. The very early implementation in [1] just didn't limit this to only cases where we write the header and that general behavior has never been changed (though the implementation is a bit different today, the effect is the same).

Also, in hindsight, I believe this claim in the Neighbourhoodie is probably too strong:

  However, CouchDB is not susceptible to the sad path.
I didn't read the article super closely the first time since I'd been through the background discussions on the finer details, but today I'd probably hedge that a bit with language along the lines of:

  However, CouchDB is *probably* not susceptible to the sad
  path. While we can't guarantee it can't happen due to how
  various I/O operations are (not) specified, we're doing as much
  as we can to prevent it. Also, don't forget that your storage
  device might be lying about fsync anyway.
The underlying logic around that requires considering the original blog post in this chain [3]. That article posits a pathological error condition where we write something, crash, restart, issue read from a dirty page cache, and then hard crashing the entire machine. In this case, the database returned a read that was never committed.

As the author of this (as in this thread) article notes:

  Using OpenZFS as an example (hey, it’s what I know), fsync()
  always flushes anything outstanding for the underlying object,
  regardless of where the writes came from.
AFAIK, this is the norm and, I assume, the reason that the NULL BITMAP article [3] suggests the fsync on open. In CouchDB land, we just went back and said, "Oh nice, we already do that for other reasons anyway." Unfortunately the "we already do it for other reasons" aspect didn't really come through. So in the end, while none of the behavior on fsync-on-open is guaranteed in anyway shape or form, it's not impossible that it's saved our bacon a non-zero number of times. Just because its not guaranteed, its common that filesystems will in fact perform those flushes regardless of which file descriptor is used.

Also, to make sure that we're not missing the field for the cornstalks, I want to point out that the double fsync commit protocol used by CouchDB is probably 99.some-more-nines responsible for CouchDB's durability guarantees. However, that's not 100%, so when we find weird edge cases like in [3] we try and make sure that we're as correct as can be. For instance, here's the response to fsync-gate [4].

[1] https://github.com/apache/couchdb/blob/956c11b35487fb8ffcf70...

[2] https://neighbourhood.ie/blog/2025/02/26/how-couchdb-prevent...

[3] https://buttondown.com/jaffray/archive/null-bitmap-builds-a-...

[4] https://github.com/apache/couchdb/commit/3505281559513e29224...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 12:33:56 GMT+0000 (Coordinated Universal Time) with Vercel.