If you’re building Python async apps (FastAPI, background jobs, etc.) with SQLite, you’ll eventually hit two issues
- Opening/closing connections is fast, but not free—overhead adds up under load
- SQLite writes are globally locked
aiosqlitepool is a tiny library that adds connection pooling for any asyncio SQLite driver (like aiosqlite):
- It avoids repeated database connection setup (syscalls, memory allocation) and teardown (syscalls, deallocation) by reusing long-lived connections
- Long-lived connections keep SQLite's in-memory page cache "hot." This serves frequently requested data directly from memory, speeding up repetitive queries and reducing I/O operations
- Allows your application to process significantly more database queries per second under heavy load
Enjoy!
gwbas1c 9 hours ago [-]
Important word:
> Python
Your repo and the readme.md don't say "python." The title of this post doesn't say "python."
It took me a while to realize that this is for python, as opposed to a general-purpose cache for, say, libsqlite.
sjsdaiuasgdia 7 hours ago [-]
Let's see...
There's tags showing what Python versions are supported.
The root dir of the repo contains a 'pyproject.toml' file.
The readme contains installation instructions for pip, poetry, and uv, all of which are Python package managers.
The readme contains example code, all of which is in Python.
The readme references asyncio, a Python module that is included in the standard library for Python 3.
The 'Languages' widget on the page shows 99.2% of the repo's code is in Python.
Every file not in the root dir has a .py extension.
Yeah, I can see why it was so hard to figure out...
tracker1 5 hours ago [-]
I'm mostly with you.. it would still be nice if the title reflected the language limitation/feature.
kstrauser 8 hours ago [-]
The tag at the top of the readme, under the title, shows which Python versions it supports. If it never mentioned Python at all, that would be the tipster.
slashdev 1 days ago [-]
How does this help with the second issue, the write locks?
ncruces 1 days ago [-]
No idea if it applies, but one way would be to direct all writes (including any transaction that may eventually write) to a single connection.
Then writers queue up, while readers are unimpeded.
dathinab 21 hours ago [-]
if you enable WAL mode with sqlite then readers are not blocked by writer so only writers queue up without needing any special case handling to archive it
(in general you _really_ should use WAL mode if using sqlite concurrently, you also should read the documentation about WAL mode tho)
ncruces 17 hours ago [-]
Writers won't queue up, rather they'll storm the place, taking turns at asking “can I go now” and sleeping for (tens, hundreds of) milliseconds at a time.
This only gets “worse” as computers get faster: imagine how many write transactions a serial writer could complete (WAL mode and normal synchronous mode) while all your writers are sleeping after the previous one left, because they didn't line up?
And, if you have a single limited pool, your readers will now be stuck waiting for an available connection too (because they're all taken by sleeping writers).
It's much fairer and more efficient for writers to line up with blocking application locks.
rich_sasha 16 hours ago [-]
I was running into some horrendous issues with WAL, where the WAL file would grow boundlessly, eventually leading to veery slow reads and writes.
It's fixable by periodically forcing the WAL to be truncated, but it took me a lot of time and pain to figure it out.
dathinab 6 hours ago [-]
The is why I said read the WAL doc page in a different answer ;)
sqlites design makes a lot of SQL concurrency synchronization edge cases much simpler as you can rely on the single writer at a time limitation. And it has some grate hidden features for using it as client application state storage. But there are use-cases it's just not very good at and moving from sqlite to other DBs can be tricky (if you ever relied on the exclusive write transaction or the way cells are blobs which can mix data types, even it it was by accident)
rich_sasha 5 hours ago [-]
I did read it. For whatever reason, automatic checkpoints basically would stop from time to time, and the WAL file would start growing like crazy.
In the end I wrote an external process that forced a checkpoint a few times a day, which worked. I came across other exasperated people in various dark corners of the Internet with the same symptoms.
normie3000 15 hours ago [-]
Interesting, were there any warning signs beyond general query slowdown?
rich_sasha 12 hours ago [-]
No warning signs and very little about it on the Internet. Just performance slows to a grind. Also hard to replicate.
WAL doesn’t cure concurrency issues for SQLite. WAL plus single writer, multiple reader threaded is required. It’s blazing fast though.
bootsmann 10 hours ago [-]
Is there a significant advantage of the sqlite in-memory page cache over the page cache that's already included with the operating system?
jitl 10 hours ago [-]
Yes: SQLite needs to inspect the schema when it opens a new connection object and does some O(number of conns) lookups in global state during this process. It’s best to avoid re-doing this work.
mostlysimilar 1 days ago [-]
Around what amount of load would you say the overhead of opening/closing becomes a problem?
jitl 10 hours ago [-]
It depends hugely on how you decide to manage the connection objects. If you have a single thread / single core server that only even opens a single connection, then connection open overhead is never a problem even under infinite load.
The two main issues w opening a connection are:
1. There is fixed cost O(database schema) time spent building the connection stuff. Ideally SQLite could use a “zygote” connection that can refresh itself and then get cloned to create a new one, instead of doing this work from scratch every time.
2. There is O(number of connections) time spent looking at a list of file descriptors in global state under a global lock. This one is REALLY BAD if you have >10,000 connections so it was a major motivator for us to do connection pooling at Notion. Ideally SQLite could use a hash table instead of a O(n) linear search for this, or disable it entirely.
Both of these issues are reasons I’m excited about Turso’s SQLite rewrite in Rust - it’s so easy to fix both of these issues in Rust (like a good hash table is 2 LoC to adopt in Rust) whereas in the original C it’s much more involved to safely and correctly fix the issue in a fork.
Furthermore, it would be great to share more of the cache between connections as a kind of “L2 cache”; again tractable and safe to build in Rust but complicated to build in a fork of the original C.
Notion uses a SQLite-backed server for our “Database” product concept that I helped write, we ran in to a lot of these kinds of issues scaling reads. We implemented connection pooling over better-sqlite3 Node module to mitigate these issues. We also use Turso’s existing SQLite C fork “libsql” for some connections since it offers a true async option backed by thread pool under the hood in the node driver, which helps in cases where you can have a bottleneck serializing or deserializing data from “node” layout to “SQLite c” layout or many concurrent writes to different DBs from a single NodeJS process.
manmal 23 hours ago [-]
Doesn’t SQLite have its own in-memory cache? Is this about having more control re cache size?
dathinab 21 hours ago [-]
yes, per "open connection", hence why not closing+reopening connections all the time helps the cache ;)
d1l 1 days ago [-]
This is strange on so many levels.
SQLite does not even do network I/O.
How does sharing a connection (and transaction scope) in an asyncio environment even work? Won’t you still need a connection per asyncio context?
Does sqlite_open really take long compared to the inevitable contention for the write lock you’ll see when you have many concurrent contexts?
Does sqlite_open even register in comparison with the overhead of the python interpreter?
What is an asyncio SQLite connection anyways? Isn’t it just a regular one that gets hucked into a separate thread?
simonw 1 days ago [-]
If you're talking to a 100KB SQLite database file this kind of thing is likely unnecessary, just opening and closing a connection for each query is probably fine.
If you're querying a multi-GB SQLite database there are things like per-connection caches that may benefit from a connection pool.
> What is an asyncio SQLite connection anyways? Isn’t it just a regular one that gets hucked into a separate thread?
Basically yes - aiosqlite works by opening each connection in a dedicated thread and then sending async queries to it and waiting for a response that gets sent to a Future.
That's even crazier - so you're using asyncio because you have a ton of slow network-bound stuff - but for your database access you are running every sqlite connection in it's own thread and just managing those threads via the asyncio event loop?
reactordev 20 hours ago [-]
Thread pooling for databases, whether network based, or disk based, is common. A lot of times it will be baked into your client, so the fact that you think it’s crazy means you’ve only dealt with clients that did this for you.
For really large data sets, you can query and wait a few minutes before getting a result. Do you really want to await that?
quietbritishjim 24 hours ago [-]
What is crazy about that?
lttlrck 23 hours ago [-]
Of course I don't know what the parent is thinking, but my thought is: why can't it be entirely event loop driven? What are the threads adding here?
(I don't know anything about that project and this isn't meant as a criticism of its design or a challenge - cos I'd probably lose :-) )
eurleif 19 hours ago [-]
SQLite doesn't have a separate server process; it does all of the work for queries in your process. So it's intrinsically CPU-heavy, and it needs threads to avoid blocking the event loop.
One way to look at is that with a client-server database and an async client library, you have a thread pool in the database server process to do the heavy lifting, and async clients talk to it via TCP. With SQLite, you have that "server" thread pool in the same process instead, and async "clients" talk to it via in-process communication.
mayli 23 hours ago [-]
Cause the sqlite-lib that python ships isn't async, and sqlite itself usually doesn't give an async API.
maxbond 23 hours ago [-]
Python's asyncio is single threaded. If you didn't send them into a different thread, the entire event loop would block, and it would degenerate to a fully synchronous single threaded program with additional overhead.
paulddraper 21 hours ago [-]
This is a common paradigm for blocking APIs (e.g. the sqlite driver)
crazygringo 1 days ago [-]
> If you're querying a multi-GB SQLite database
In which case SQLite is probably the wrong tool for the job, and you should be using Postgres or MySQL that is actually designed from the ground up for lots of concurrent connections.
SQLite is amazing. I love SQLite. But I love it for single-user single-machine scenarios. Not multi-user. Not over a network.
Kranar 20 hours ago [-]
SQLite is a great database for organizing data in desktop applications, including both productivity software and even video games. It's certainly not at all unreasonable for those use cases to have files that are in the low GB and I would much rather use SQLite to process that data instead of bundling MySQL or Postgres into my application.
simonw 1 days ago [-]
Multi-GB is tiny these days.
I didn't say anything about concurrent access. SQLite with WAL mode is fine for that these days for dozens of concurrent readers/writers (OK only one writer gets to write at a time, but if your writes queue for 1-2ms who cares?) - if you're dealing with hundreds or thousands over a network then yeah, use a server-based database engine.
da_chicken 18 hours ago [-]
Multi GB is tiny, but that doesn't make SQLite magically better at large queries of multi GB databases. That's why DuckDB has been getting more popular.
benjiro 17 hours ago [-]
Sqlite != DuckDB... two totally different DB types. One is a row based, the other is a column based database. Both run different workloads and both can handle extreme heavy workloads.
da_chicken 1 hours ago [-]
Yes, that's the point I'm making. If SQLite didn't ever struggle with databases in the GB ranges, then there wouldn't be much call to replace it with DuckDB. The fact that there's significant value in an OLAP RDBMS suggests that SQLite is falling short.
brulard 24 hours ago [-]
I always had troubles having multiple processes get write access to the sqlite file. For example if I have node.js backend work with that file, and I try to access the file with different tool (adminer for example) it fails (file in use or something like that). Should it work? I don't know if I'm doing something wrong, but this is my experience with multiple projects.
dathinab 21 hours ago [-]
There are multiple aspects to it:
- sqlite is a bit like a RWLocked database either any number or readers xor exactly one writer and no readers
- but with WAL mode enabled readers and writers (mostly) don't block each other, i.e. you can have any number of readers and up to one writer (so normally you want WAL mode if there is any concurrent access)
- if a transaction (including implicit by a single command without "begin", or e.g. upgrading from a read to a write transaction) is taking too long due to a different processes write transaction blocking it SQLITE_BUSY might be returned.
- in addition file locks might be used by SQL bindings or similar to prevent multi application access, normally you wouldn't expect that but given that sqlite had a OPEN_EXCLUSIVE option in the past (which should be ignored by half way modern impl. of it) I wouldn't be surprised to find that.
- your file system might also prevent concurrent access to sqlite db files, this is a super obscure niche case but I have seen it once (in a shared server, network filesystem(??) context, probably because sqlite really doesn't like network filesystems often having unreliable implementations for some of the primitives sqlite needs for proper synchronization)
as other comments pointed out enabling WAL mode will (probably) fix your issues
Your throughput will be much worse than a single process, but it's possible, and sometimes convenient. Maybe something in your stack is trying to hold open a writable connection in both processes?
simonw 21 hours ago [-]
That is because the default SQLite mode is journal, but for concurrent reads and writes you need to switch it to WAL.
brulard 13 hours ago [-]
I use WAL basically everywhere. I thought that would fix my problem some time ago, but it didn't
simonw 10 hours ago [-]
Are you seeing SQLITE_BUSY errors?
Those are a nasty trap. The solution is non-obvious: you have to use BEGIN IMMEDIATE on any transaction that performs at least one write: https://simonwillison.net/tags/sqlite-busy/
brulard 1 hours ago [-]
Thanks for the direction. I thought SQLite was limited in how multiple processes can access the db files, but now I see the problem is on my end. Btw. I'm a fan of your AI/LLM articles, thanks for your awesome work.
cyanydeez 23 hours ago [-]
PRAGMA journal_mode = WAL;
Asmod4n 10 hours ago [-]
An average human being can produce around 650MB of text during a while work lifetime when doing nothing but write text 4 hours per weekday without any interruptions.
Saying multi gigabyte databases for single user usage is the norm feels insane to me.
simonw 9 hours ago [-]
Have you seen the size of the database of email you've received?
jitl 9 hours ago [-]
Postgres will shit itself without a connection pooling proxy server like PGBouncer if you try even like 5000 concurrent connections because Postgres spawned a UNIX process per inbound connection. There’s much more overhead per connection in Postgres than SQLite!
drzaiusx11 9 hours ago [-]
Likewise MySQL will shit itself with just a couple hundred connections unless you have a massive instance size. We use AWS' RDS proxy in front for a similar solution. I've spent way too many hours tuning pool sizes, resolving connection pinning issues...
naasking 24 hours ago [-]
> In which case SQLite is probably the wrong tool for the job
Why? If all it's missing is an async connection pool to make it a good tool for more jobs, what's the problem with just creating one?
nomel 18 hours ago [-]
It's a bit re-inventing the wheel, since solving all the problems that come with network access is precisely why those databases exist, and what they've already done.
asyncpg is a nice python library for postgres.
I think postgres releasing a nice linkable, "serverless" library would be pretty amazing, to make the need for abusing sqlite like this (I do it too) go away.
jitl 9 hours ago [-]
Postgres has really not solved problems that come with being a networked server and will collapse under concurrent connections far before you start to feel it with SQLite. 5000 concurrent connections will already start to deadlock your Postgres server; each new connection in Postgres is a new Postgres process and the state for the connection needs to be written to various internal tracking tables. It has a huge amount of overhead; connection pooling in PG is required and often the total system has a rather low fixed limit compared to idk, writing 200 lines of python code or whatever and getting orders of magnitude more connections out of a single machine.
anarazel 8 hours ago [-]
A connection definitely has overhead in PG, but "5000 concurrent connections will already start to deadlock your Postgres server" is bogus. People completely routinely run with more connections.
Check the throughput graphs from this blog post from 2020 (for improvements I made to connection scalability):
That's for read-mostly work. If you do write very intensely, you're going to see more contention earlier. But that's way way worse with sqlite, due to its single writer model.
EDIT: Corrected year.
jitl 4 hours ago [-]
Yeah, I think I'm conflating our fear of >5000 connections for our Postgres workload (read-write that is quite write heavy) with our SQLite workload, which is 99.9% read.
The way our SQLite workload works is that we have a pool of hundreds of read connections per DB file, and a single writer thread per DB file that keeps the DB up to date via CDC from Postgres; basically using SQLite as a secondary index "scale out" over data primarily written to Postgres. Because we're piping Postgres replication slot -> SQLite, we don't suffer any writer concurrency and throughput is fine to keep up with the change rate so far. Our biggest bottleneck is reading the replication slot on the Postgres side into Kafka with Debezium.
It really depends on what your workload looks like, but I think synchronous will win most of the time.
charleslmunger 17 hours ago [-]
A connection pool is absolutely a best practice. One of the biggest benefits is managing a cache of prepared statements, the page cache, etc. Maybe you have temp tables or temp triggers too.
Even better is to have separate pools for the writer connection and readers in WAL mode. Then you can cache write relevant statements only once. I am skeptical about a dedicated thread per call because that seems like it would add a bunch of latency.
pjmlp 9 hours ago [-]
For some strange reason, some people feel like using SQLite all over the place, even when a proper RDMS would be the right answer.
9rx 8 hours ago [-]
It is not that strange when you consider the history. You see, as we started to move away from generated HTML into rich browser applications, we started to need minimal direct DBMS features to serve the rich application. At first, few functions were exposed as "REST APIs". But soon enough those few featured turned into full-on DBMSes, resulting in a DMBS in front of a DBMS. But then people, rightfully, started asking: "Why are we putting a DBMS in front of a DBMS?"
The trouble is that nobody took a step back and asked: "Can we simply use the backing DBMS?" Instead, they trudged forward with "Let's get rid of the backing DBMS and embed the database engine into our own DBMS!" And since SQLite is a convenient database engine...
fidotron 8 hours ago [-]
I recently encountered a shared SQLite db being used for inter process pub sub for real time data . . . in a safety critical system.
Wrong on so many levels it's frightening.
aynyc 6 hours ago [-]
Is it? It was designed for damage control system on naval combat vessels. I have no idea what it does on a naval vessel, but I imagine there is certain level of safeness.
wmanley 7 hours ago [-]
Regarding shared caching: Use `PRAGMA mmap_size` to enable mmap for reading your database. This way SQLite won't add another layer of page caching on top saving RAM and making things faster. SQLite only uses mmap for reads and will continue to write to the database with pwrite().
You must set it to a value higher than the size of your DB. I use:
PRAGMA mmap_size = 1099511627776;
(1TB)
rogerbinns 7 hours ago [-]
Unless you compile SQLite yourself, you'll find the maximum mmap size is 2GB. ie even with your pragma above, only the first 2GB of the database are memory mapped. It is defined by the SQLITE_MAX_MMAP_SIZE compile time constant. You can use pragma compile_options to see what the value is.
That seems like a holdover from 32-bit days. I wonder why this is still the default.
bawolff 19 hours ago [-]
> The primary challenge with SQLite in a concurrent environment (like an asyncio web application) is not connection time, but write contention. SQLite uses a database-level lock for writes. When multiple asynchronous tasks try to write to the database simultaneously through their own separate connections, they will collide. This contention leads to a cascade of SQLITE_BUSY or SQLITE_LOCKED errors.
I really don't get it. How would this help?
The benchmarks dont mention which journal mode sqlite is configured as, which is very suspicious as that makes a huge difference under concurrent load.
pornel 12 hours ago [-]
Sharing one SQLite connection across the process would necessarily serialize all writes from the process. It won't do anything for contention with external processes, the writes within the process wouldn't be concurrent any more.
Basically, it adds its own write lock outside of SQLite, because the pool can implement the lock in a less annoying way.
bawolff 9 hours ago [-]
I don't understand, all writes to a single sqlite DB are going to be serialized no matter what you do.
> Basically, it adds its own write lock outside of SQLite, because the pool can implement the lock in a less annoying way.
Less annoying how? What is the difference?
pornel 3 hours ago [-]
SQLite's lock is blocking, with a timeout that aborts the transaction. An async runtime can have a non-blocking lock that allows other tasks to proceed in the meantime, and is able to wait indefinitely without breaking transactions.
bawolff 1 hours ago [-]
What's the benefit of this over just doing PRAGMA busy_timeout = 0; to make it non-blocking ?
After all, as far as i understand, the busy timeout is only going to occur at the beginning of a write transaction, so its not like you have to redo a bunch of queries.
slaily 8 hours ago [-]
When your program does heavy concurrent writing and opens/closes connections for each write, most of them will fail with SQLITE_BUSY or SQLITE_LOCKED errors.
This situation can be managed with a pool of small (5 connections or less) to prevent spawning too many connections. This will reduce racing between them and allow write operations to succeed.
mayli 23 hours ago [-]
FYI, I've once had few long-lived connection with wal, and wal file just goes exploded. Turns out sqlite won't truncate the wal if there are open connections.
infamia 21 hours ago [-]
Using WAL2 should make that problem better. It has two WAL files it alternates between when making writes, so the system has an opportunity to check point the WAL file not in use.
I've been thinking about trying pre-serialization of SQLite commands to enable single-writer against a singleton SQLiteConnection using something like Channel<T> or other high performance MPSC abstraction. Most SQLite providers have an internal mutex that handles serialization, but if we can avoid all contention on this mutex things might go faster. Opening and closing SQLite connections is expensive. If we can re-use the same instance things go a lot faster.
anacrolix 13 hours ago [-]
When you have multiple sqlite connections, any write will flush the caches of other connections. So more connections is not always better.
ddorian43 9 hours ago [-]
No synchronous api support ?
Rendered at 22:08:33 GMT+0000 (Coordinated Universal Time) with Vercel.
- Opening/closing connections is fast, but not free—overhead adds up under load
- SQLite writes are globally locked
aiosqlitepool is a tiny library that adds connection pooling for any asyncio SQLite driver (like aiosqlite):
- It avoids repeated database connection setup (syscalls, memory allocation) and teardown (syscalls, deallocation) by reusing long-lived connections
- Long-lived connections keep SQLite's in-memory page cache "hot." This serves frequently requested data directly from memory, speeding up repetitive queries and reducing I/O operations
- Allows your application to process significantly more database queries per second under heavy load
Enjoy!
> Python
Your repo and the readme.md don't say "python." The title of this post doesn't say "python."
It took me a while to realize that this is for python, as opposed to a general-purpose cache for, say, libsqlite.
There's tags showing what Python versions are supported.
The root dir of the repo contains a 'pyproject.toml' file.
The readme contains installation instructions for pip, poetry, and uv, all of which are Python package managers.
The readme contains example code, all of which is in Python.
The readme references asyncio, a Python module that is included in the standard library for Python 3.
The 'Languages' widget on the page shows 99.2% of the repo's code is in Python.
Every file not in the root dir has a .py extension.
Yeah, I can see why it was so hard to figure out...
Then writers queue up, while readers are unimpeded.
(in general you _really_ should use WAL mode if using sqlite concurrently, you also should read the documentation about WAL mode tho)
This only gets “worse” as computers get faster: imagine how many write transactions a serial writer could complete (WAL mode and normal synchronous mode) while all your writers are sleeping after the previous one left, because they didn't line up?
And, if you have a single limited pool, your readers will now be stuck waiting for an available connection too (because they're all taken by sleeping writers).
It's much fairer and more efficient for writers to line up with blocking application locks.
It's fixable by periodically forcing the WAL to be truncated, but it took me a lot of time and pain to figure it out.
They do point out the risks here: https://sqlite.org/wal.html#avoiding_excessively_large_wal_f...
sqlites design makes a lot of SQL concurrency synchronization edge cases much simpler as you can rely on the single writer at a time limitation. And it has some grate hidden features for using it as client application state storage. But there are use-cases it's just not very good at and moving from sqlite to other DBs can be tricky (if you ever relied on the exclusive write transaction or the way cells are blobs which can mix data types, even it it was by accident)
In the end I wrote an external process that forced a checkpoint a few times a day, which worked. I came across other exasperated people in various dark corners of the Internet with the same symptoms.
If I had a blog, I'd be writing about it.
The two main issues w opening a connection are:
1. There is fixed cost O(database schema) time spent building the connection stuff. Ideally SQLite could use a “zygote” connection that can refresh itself and then get cloned to create a new one, instead of doing this work from scratch every time.
2. There is O(number of connections) time spent looking at a list of file descriptors in global state under a global lock. This one is REALLY BAD if you have >10,000 connections so it was a major motivator for us to do connection pooling at Notion. Ideally SQLite could use a hash table instead of a O(n) linear search for this, or disable it entirely.
Both of these issues are reasons I’m excited about Turso’s SQLite rewrite in Rust - it’s so easy to fix both of these issues in Rust (like a good hash table is 2 LoC to adopt in Rust) whereas in the original C it’s much more involved to safely and correctly fix the issue in a fork.
Furthermore, it would be great to share more of the cache between connections as a kind of “L2 cache”; again tractable and safe to build in Rust but complicated to build in a fork of the original C.
Notion uses a SQLite-backed server for our “Database” product concept that I helped write, we ran in to a lot of these kinds of issues scaling reads. We implemented connection pooling over better-sqlite3 Node module to mitigate these issues. We also use Turso’s existing SQLite C fork “libsql” for some connections since it offers a true async option backed by thread pool under the hood in the node driver, which helps in cases where you can have a bottleneck serializing or deserializing data from “node” layout to “SQLite c” layout or many concurrent writes to different DBs from a single NodeJS process.
SQLite does not even do network I/O.
How does sharing a connection (and transaction scope) in an asyncio environment even work? Won’t you still need a connection per asyncio context?
Does sqlite_open really take long compared to the inevitable contention for the write lock you’ll see when you have many concurrent contexts?
Does sqlite_open even register in comparison with the overhead of the python interpreter?
What is an asyncio SQLite connection anyways? Isn’t it just a regular one that gets hucked into a separate thread?
If you're querying a multi-GB SQLite database there are things like per-connection caches that may benefit from a connection pool.
> What is an asyncio SQLite connection anyways? Isn’t it just a regular one that gets hucked into a separate thread?
Basically yes - aiosqlite works by opening each connection in a dedicated thread and then sending async queries to it and waiting for a response that gets sent to a Future.
https://github.com/omnilib/aiosqlite/blob/895fd9183b43cecce8...
For really large data sets, you can query and wait a few minutes before getting a result. Do you really want to await that?
(I don't know anything about that project and this isn't meant as a criticism of its design or a challenge - cos I'd probably lose :-) )
One way to look at is that with a client-server database and an async client library, you have a thread pool in the database server process to do the heavy lifting, and async clients talk to it via TCP. With SQLite, you have that "server" thread pool in the same process instead, and async "clients" talk to it via in-process communication.
In which case SQLite is probably the wrong tool for the job, and you should be using Postgres or MySQL that is actually designed from the ground up for lots of concurrent connections.
SQLite is amazing. I love SQLite. But I love it for single-user single-machine scenarios. Not multi-user. Not over a network.
I didn't say anything about concurrent access. SQLite with WAL mode is fine for that these days for dozens of concurrent readers/writers (OK only one writer gets to write at a time, but if your writes queue for 1-2ms who cares?) - if you're dealing with hundreds or thousands over a network then yeah, use a server-based database engine.
- sqlite is a bit like a RWLocked database either any number or readers xor exactly one writer and no readers
- but with WAL mode enabled readers and writers (mostly) don't block each other, i.e. you can have any number of readers and up to one writer (so normally you want WAL mode if there is any concurrent access)
- if a transaction (including implicit by a single command without "begin", or e.g. upgrading from a read to a write transaction) is taking too long due to a different processes write transaction blocking it SQLITE_BUSY might be returned.
- in addition file locks might be used by SQL bindings or similar to prevent multi application access, normally you wouldn't expect that but given that sqlite had a OPEN_EXCLUSIVE option in the past (which should be ignored by half way modern impl. of it) I wouldn't be surprised to find that.
- your file system might also prevent concurrent access to sqlite db files, this is a super obscure niche case but I have seen it once (in a shared server, network filesystem(??) context, probably because sqlite really doesn't like network filesystems often having unreliable implementations for some of the primitives sqlite needs for proper synchronization)
as other comments pointed out enabling WAL mode will (probably) fix your issues
Your throughput will be much worse than a single process, but it's possible, and sometimes convenient. Maybe something in your stack is trying to hold open a writable connection in both processes?
Those are a nasty trap. The solution is non-obvious: you have to use BEGIN IMMEDIATE on any transaction that performs at least one write: https://simonwillison.net/tags/sqlite-busy/
Saying multi gigabyte databases for single user usage is the norm feels insane to me.
Why? If all it's missing is an async connection pool to make it a good tool for more jobs, what's the problem with just creating one?
asyncpg is a nice python library for postgres.
I think postgres releasing a nice linkable, "serverless" library would be pretty amazing, to make the need for abusing sqlite like this (I do it too) go away.
Check the throughput graphs from this blog post from 2020 (for improvements I made to connection scalability):
https://techcommunity.microsoft.com/blog/adforpostgresql/imp...
That's for read-mostly work. If you do write very intensely, you're going to see more contention earlier. But that's way way worse with sqlite, due to its single writer model.
EDIT: Corrected year.
The way our SQLite workload works is that we have a pool of hundreds of read connections per DB file, and a single writer thread per DB file that keeps the DB up to date via CDC from Postgres; basically using SQLite as a secondary index "scale out" over data primarily written to Postgres. Because we're piping Postgres replication slot -> SQLite, we don't suffer any writer concurrency and throughput is fine to keep up with the change rate so far. Our biggest bottleneck is reading the replication slot on the Postgres side into Kafka with Debezium.
It really depends on what your workload looks like, but I think synchronous will win most of the time.
Even better is to have separate pools for the writer connection and readers in WAL mode. Then you can cache write relevant statements only once. I am skeptical about a dedicated thread per call because that seems like it would add a bunch of latency.
The trouble is that nobody took a step back and asked: "Can we simply use the backing DBMS?" Instead, they trudged forward with "Let's get rid of the backing DBMS and embed the database engine into our own DBMS!" And since SQLite is a convenient database engine...
Wrong on so many levels it's frightening.
You must set it to a value higher than the size of your DB. I use:
(1TB)https://sqlite.org/compile.html#max_mmap_size
Ubuntu system pragma compile_options:
I really don't get it. How would this help?
The benchmarks dont mention which journal mode sqlite is configured as, which is very suspicious as that makes a huge difference under concurrent load.
Basically, it adds its own write lock outside of SQLite, because the pool can implement the lock in a less annoying way.
> Basically, it adds its own write lock outside of SQLite, because the pool can implement the lock in a less annoying way.
Less annoying how? What is the difference?
After all, as far as i understand, the busy timeout is only going to occur at the beginning of a write transaction, so its not like you have to redo a bunch of queries.
This situation can be managed with a pool of small (5 connections or less) to prevent spawning too many connections. This will reduce racing between them and allow write operations to succeed.
https://sqlite.org/src/doc/wal2/doc/wal2.md