- Fork of SQLite with new features https://github.com/libsql/libsql
- And you can embed it in your application and it will really talk to your SQLite/libsql database over the network (subject of this blogpost): https://github.com/libsql/sqld
- Oh, and if you're wondering about backup to S3, they have that too: https://github.com/libsql/bottomless
- Uh, sqld can integrated with this https://github.com/losfair/mvsqlite, so now your SQLite is backed by FoundationDB!?
- Meanwhile Litestream exists https://github.com/benbjohnson/litestream/
- Ben is developing https://github.com/superfly/litefs at Fly so that you can talk to your SQLite through a FUSE filesystem. (and work has stopped on the streaming replication effort https://github.com/benbjohnson/litestream/issues/8)
- And, of course, SQLite has announced a new backend that hopes to support concurrent writes and streaming replication: https://sqlite.org/hctree/doc/hctree/doc/hctree/index.html
What a time for SQLite
Presumably all of these things provide a wildly different experience in terms of consistency, availability, latency, complexity, and concurrent use. Would be so nice to read a lengthy blogpost comparing all or some of these with their pros and cons.
I’m glad you’re into it but none of that sounds fun or appealing to me. I have no issue with SQLite or any of these projects, but my default for stuff that needs eg concurrent writes or streaming replication is Postgres which was designed from the ground up to be this sort of database and had those features built into the mainline release for some time.
I trust the default of Postgres way more than I trust myself to pick some new fork of SQLite via what I learn in blog posts and have it be well fit to the problem and reliable and adaptable.
I get that SQLite can be easier to set up and faster on reads but I feel like we’ve been down this road before trading away Postgres for those sorts of seeming gains - I’m getting MongoDB flashbacks.
Don’t get me wrong, I’d choose SQLite for a local app storage solution in a heartbeat but I’m wary of overstretching it.
I've been dogfooding for a while by using it in my side projects.
And there's a basic API too, to use it over a network: https://github.com/aaviator42/StorX-API
- Looks like bottomless does automatic restoring of the database? Litestream seems to avoid this, I assume because of concerns about accidentally creating multiple writers on a rolling-deploy (or similar mistake). Any concerns about this possible footgun?
- Bottomless is a sqlite extension, not a separate process? Pros and cons compared to Litestream?
- Similar questions for sqld. How does sqld handle the lifecycle of deploys and multiple readers/writers talking to the database? Anything a new user should be concerned about?
> - Looks like bottomless does automatic restoring of the database? Litestream seems to avoid this, I assume because of concerns about accidentally creating multiple writers on a rolling-deploy (or similar mistake). Any concerns about this possible footgun?
It's a valid concern, but what always happens on boot is starting a new generation (a generation is basically a snapshot of the main file + its continuously replicated WAL), distinguished by a uuid v7, the timestamped one. So even if a collision happens, it would be recoverable - e.g. one of the stray generations should be deleted.
> - Bottomless is a sqlite extension, not a separate process? Pros and cons compared to litestream?
The only con I see is that a bug in the extension could interfere with the database. As for pros: way less maintenance work, because everything is already embedded, we're also hooked into the database via a virtual WAL interface (libSQL-only), so we have full control over when to replicate, without having to observe the .wal file and reason about it from a separate process.
> - Similar questions for sqld. How does sqld handle the lifecycle of deploys and multiple readers/writers talking to the database? Anything a new user should be concerned about?
There are going to be multiple flavors of sqld, but the rough idea would be to trust the users to only launch a single primary. In the current state of the code, replicas contact the primary in order to register themselves, so the model is centralized. Once we build something on top of a consensus algorithm, leader election will be pushed to the algorithm itself.
The ROAPI(https://github.com/roapi/roapi) project I built also happened to support a similar feature set, i.e. to expose sqlite through a variety of remote query interfaces including pg wire protocols, rest apis and graphqls.
The other 50% of the value proposition is simplified operations in a B2B setting.
Postgres can't achieve these query times?
The nice thing about sqlite, i.m.o., is mainly that it is just one file.
You are still using the network stack and involving multiple processes.
In-process is where you get things like instruction-level parallelism.
I think our university programs have failed us the most in this regard. ILP should be drilled into every student's brain as the penultimate performance hack. Once you start separating everything into a bunch of different buckets/systems/processes, none of these advantages can be explored anymore.
In some cases, ILP can give you a 100x improvement in performance. That's serial throughput. This is the reason many developers consider bit manipulation/hacking to be essentially free.
As a data point, when PG is on the same box as the querying app, you can run the connection over a unix domain socket instead of going over localhost.
In theory (!), that's supposed to have reduced latency + higher bandwidth too. It's been a bunch of years since I last saw benchmarks about it though, so things might have changed a bit in the meantime (no idea).
Attention is better spent designing an appropriate schema than worrying about the network being too slow to run a single query (if it is, you’ve already failed).
It's been ages since I worked in Microsoft land, but this is one of the reasons why MS shops often scale up instead of out. Imagine a single box with a dozens of beefy cores and a couple hundred gigs of RAM running both IIS and SQL. That's... a significant amount of power.
It's also a potentially massive blast radius for security oopsies, but I don't know the practical history of how secure it has or hasn't been, or how it integrates with things like clustering etc.
I honestly think a lot of businesses would honestly be better off with this kind of simplified "scale up" architecture.
The downsides are clear and it's not the right fit for everything, obviously. But I see the IT world defaulting to these complicated 95384-piece AWS instances "Because that's how Netflix does it" or whatever and it's like... you're not Netflix and hardly anybody is.
If you're relying on your app's execution time to demonstrate this to a user, then you have poor UX. Instant changes and updates have been the norm in mobile apps forever and optimistic UI is becoming pretty standard on the web too, see Linear for example.
But 10,000 requests _per second_ being 50 or 300ms matters a lot
Is anybody here using it in production with success?
Works like a charm, you basically get to build your app like it was dealing with a single tenant DB. Just make sure it's as hard as humanly possible for application developers to forget setting that tenant ID when they query... In our case we have a custom JPA base repository which takes care of that, and some PMD rules to catch rogue repos/queries.
The main practical downside is that it is more precise and rigorous than your app- or business-level auth logic is likely to be, and has no real escape hatches. If you're trying to drop it on an existing system you're going to spend a lot of time going back and forth with whoever to shake out edge case ambiguities in your rbac policy that weren't obvious or relevant before.
You can do pretty much everything within Postgres, from ETL to Authz to serving HTML templates and GIS apps. However, that doesn't mean you should, or that anyone is seriously using it in production on a large scale (after evaluating alternatives).
Postgres also can't be embedded (according to some brief googling), so you'd need to run it as a separate process.
Should my Rust server be running sqld, and I would add a passthrough endpoint through its existing http api to the sqlite?
Or alternatively, should I be using sqld locally to access my sqlite instance through SSH or something?
So you are suggesting to have both the "serverless" db and the storage? If so, you now have 3 problems.
EDIT: While there are things in their repos that suggests they might be thinking about moving towards allowing multiple writers, what's currently there suggests a single current active instance of each database, with the WAL being sync'ed to object storage so that in the case of failure and/or when doing a cold-start, the database can be brought back from object storage.
So you'd put up a proxy to handle auth, and match an incoming request either to a running database or to a cold database. If it's for a running database, it'd replicate to S3 or similar. If it's for a cold database, you sync from S3 or similar and start the server side process.
To your point, you absolutely need to be able to reliably grant a lease of some sort to whichever frontend pulls down the database and starts and endpoint, or you're absolutely right you'll have huge problems.
Absolutely won't be suitable for every kind of workload, but if you've already committed to running your stuff in a serverless setup, having your database(s) handled that way might be appealing.
Sounds good but I'm curious what's criteria (need) to architect like this in the first place.
I've run large numbers of Postgres databases, and it's not hard to automate, but it's hard to optimise for a setup where the usage patterns of individual databases are hard to estimate. Is your customer using it for batch jobs, or for persistent streams of data? Who can you colocate with whom? When the cost of shutting a database down one place and migrating it elsewhere becomes very low, this kind of scenario can potentially become a lot easier.
In terms of from the user perspective, I'd expect you really wouldn't care, other than in terms of cold start times and cost. Except perhaps for batch jobs etc., where being able to write apps that checks state, obtains a lease, downloads the most recent version does it's job and uploads the result to durable storage might well be convenient rather than having to e.g. keep a bunch of databases constantly running.
Is this one.db per customer? If so, how do you deal with schema migration?
But consider that for setups like this the database might well be the customers own database, so their schema might not be something it's your job to touch at all.
I wish people would at least vaguely try to be helpful naming products.
"SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine."
Wouldn't a fork also be a library?
It does seem like it makes more sense to call it something that alludes that it’s postgres
That you can build stuff around it, doesn't invalidate the fact that the core of it is a library.