NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
CompactLog – Solving CT Scalability with LSM-Trees (github.com)
Eikon 18 hours ago [-]
I'm excited to share CompactLog, a Certificate Transparency log implementation written in Rust that explores an alternative approach to CT scalability challenges.

Certificate Transparency logs face immense scale - millions of certificates daily, with logs growing to hundreds of millions of entries and multiple terabytes of data. Traditional implementations often struggle with:

    Database bottlenecks at multi-terabyte scale
    Expensive hardware requirements for adequate read performance
    Complex sharding and operational overhead
    Read operations that can overwhelm the system
The CT community has been exploring different solutions to these challenges. One approach involves creating new APIs - moving from the traditional RFC 6962 dynamic endpoints to static, tile-based systems that can leverage CDNs and object storage more effectively. These new designs offer compelling benefits like efficient caching and horizontal scaling.

CompactLog explores a different question: Can we achieve similar scalability improvements while maintaining the existing RFC 6962 API?

Instead of changing the API, CompactLog rethinks the storage and implementation layers:

1. LSM-Tree Storage via SlateDB

Moving from relational databases to LSM-tree storage provides:

    Natural fit for append-only workloads
    Direct integration with object storage (S3, Azure Blob)
    No artificial database size limitations
    Cost-effective scaling with commodity storage
2. Zero Merge Delay Architecture

CompactLog eliminates merge delay entirely by incorporating certificates into the tree before issuing SCTs:

    Submit → Add to tree → Issue SCT (all within 500ms)
This removes a common failure mode where logs fail to meet their merge delay commitments.

3. Smart Versioning Strategy

By versioning tree nodes only at Signed Tree Head boundaries rather than every update, storage overhead drops dramatically - potentially 1000x reduction for typical publishing intervals.

4. Performance

CompactLog achieves gigabits of API responses per second on commodity hardware, and is able to handle thousands of chain additions per second.

NoahZuniga 18 hours ago [-]
Does this system reduce the need for temporal sharding?
Eikon 18 hours ago [-]
Yes, temporal sharding would theoretically become unnecessary with this system. The current implementation should be capable of handling the entire historical CT dataset without requiring time-based partitioning.

On storage efficiency, CompactLog requires about 200-400GB per 100 million certificates and can be run even on slow hard drives (there's support for local caching, which usually would be NVMEs drives).

agwa 17 hours ago [-]
No. The purpose of temporal sharding is to allow log operators to reclaim storage space used for expired certificates, and to save monitors from having to sift through a huge number of expired certificates to get to the unexpired ones.
Eikon 17 hours ago [-]
> The purpose of temporal sharding is to allow log operators to reclaim storage space used for expired certificates

But you lose historical auditability that way, for arguably very little space reclaimed. I'd say it's not worth it if you can have an implementation that "just works.".

We are not talking about petabytes of data, the total number of certificates logged is around 18B, that's somewhere around 50TB using CompactLog, which fits on ... two hard drives these days? In the real world, that'd be way less for a single log as not every log holds every issued certificate.

agwa 16 hours ago [-]
The primary purpose of CT is to provide transparency into browser-trusted certificates. Since expired certificates are not trusted by browsers, having them in browser-trusted CT logs is purely a liability.

Those who care about expired certificates can archive shards before they are decommissioned, or operate a non-sharded log for expired certificates that isn't trusted by browsers (e.g. Google Daedalus).

Temporal sharding was designed when there were far fewer than a billion certificates and logs weren't struggling with read or write load. The need for sharding will only get greater as lifetimes are capped at 45 days.

OsrsNeedsf2P 16 hours ago [-]
This is a very neat solution. Kudos for solving it!

What's the average lookup time for a certificate as the DB grows into the TB range? Is it using Bloom Filters to cut this down?

Eikon 16 hours ago [-]
Thank you!

SlateDB is indeed using bloom filters. Here's the end of the tree on a test instance where I loaded 50 million certificates - the latency is basically indistinguishable from an instance with only a few thousand records: https://compact-log.pre-test.ct.merklemap.com/ct/v1/get-entr...

You can see the current tree size here: https://compact-log.pre-test.ct.merklemap.com/ct/v1/get-sth

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 08:39:36 GMT+0000 (Coordinated Universal Time) with Vercel.