> Then, there’s a random byte like 0x25 or 0x07 followed by the column data – the rest of the columns are string types so they’re all stored in UTF-8. If you know what these inter-column bytes mean, leave a comment below! I can’t figure it out.
Next paragraph mentions TOAST and this byte is related to that. The low order bits (on little endian platforms) determine whether the value is stored inline (00, first 4 bytes are total length), is stored in TOAST table (11) or is shorter than 127 bytes (01 for even length, 10 for odd length, the total length is first byte >> 1). So for 0x25 you get 01, so length is 0x25 >> 1 = 18, which is that byte followed by "Equatorial Guinea".
Edit: the reason why endianness matters is that the same representation is also used in memory and the whole first word is interpreted as one length value. The toast tag bits have to be in first byte, which is most easily done as two highest order bits of that word on big endian. That means that it is placed in the two highest bits of the byte.
indoordin0saur 32 days ago [-]
This URL is blocked by my company's network because of a certain substring in the URL lol
In this case the substring is part of the author's name. Such names are not at all uncommon.
hinkley 32 days ago [-]
Alistair Cockburn is one of the signatories of the Agile Manifesto.
You may have heard of him more recently with the Hexagonal Architecture approach.
asciii 32 days ago [-]
I just saw something similar with a user on here dangsux...
Apparently might be Dang's UX and not against the Mod. ¯\_(ツ)_/¯
drewsberry 32 days ago [-]
There are dozens of us!
drewsberry 32 days ago [-]
Well if it isn't my arch-nemesis – my legally designated name. Maybe I should've gone for something with just my first name like drewsexpertblog.dev
yencabulator 31 days ago [-]
I'd recommend trying to host it at a more reputable domain. Like the commercial use subdomain of Cook Islands, .co.ck
rpmisms 31 days ago [-]
Have you been saving that one for a rainy day?
32 days ago [-]
LVB 32 days ago [-]
I remember us once giving a supplier access to our internal bug tracker for a collaborative project. They were unable to get to the “…/openissue” endpoint.
fullstop 32 days ago [-]
I was on a mailing list once where messages were blocked because of msexchange being in the headers.
Izkata 32 days ago [-]
Any relation to ExpertsExchange?
fullstop 32 days ago [-]
Ah, this was before ExpertsExchange existed. I think that it was Corel that had the mail filter, and it was related to Corel Linux.
You mean ExpertSexChange? If you’re gonna capitalize it, do it right.
01HNNWZ0MV43FF 30 days ago [-]
Fair, I would never do mine at Microsoft
forinti 32 days ago [-]
I once worked for a company that blocked Cuban sites because of .cu (which is the Portuguese word for the end of your digestive system), but did not block porn sites (or so I was told ;-).
CodesInChaos 32 days ago [-]
Are you sure it wasn't because of the US embargo on Cuba? Companies outside the US often participate(d) as well, because they want to do business with the US and US companies.
forinti 32 days ago [-]
The proxy replied with a message stating that the site was blocked because it was pornographic.
hinkley 32 days ago [-]
That seems like the sort of thing you check on your last day as you’re going out the door.
“The rumors are true!”
(Although less amusing, you could also just ask the IT guys and gals)
crngefest 32 days ago [-]
What why ?
Would people browse bigcocks.net unless it’s explicitly blocked?
What about cox?
Is „tube“ on a blocklist as well?
davidmurdoch 32 days ago [-]
Because a sales person at some "security" company convinced a CTO that it was as good idea.
actionfromafar 32 days ago [-]
The security company just follows "best practices" :)
indoordin0saur 32 days ago [-]
It is idiotic, yes. This feature is certainly outsourced to our mediocre IT contractor
twic 32 days ago [-]
My company also blocks it, but as phishing.
drewsberry 32 days ago [-]
Mine blocked it when I first created it, you can usually flag an incorrectly classified site and they'll correct it (I assure that you I'm not doing any phishing).
Brian_K_White 32 days ago [-]
Shut off like a petcock.
sgarland 32 days ago [-]
If anyone is interested in contrasting this with InnoDB (MySQL’s default engine), Jeremy Cole has an outstanding blog series [0] going into incredible detail.
> The Arrow columnar format includes a language-agnostic in-memory data structure specification, metadata serialization, and a protocol for serialization and generic data transport.This document is intended to provide adequate detail to create a new implementation of the columnar format without the aid of an existing implementation. We utilize Google’s Flatbuffers project for metadata serialization, so it will be necessary to refer to the project’s Flatbuffers protocol definition files while reading this document.The columnar format has some key features:
> Data adjacency for sequential access (scans)
> O(1) (constant-time) random access
> SIMD and vectorization-friendly
> Relocatable without “pointer swizzling”, allowing for true zero-copy access in shared memory
Are the major SQL file formats already SIMD optimized and zero-copy across TCP/IP?
Arrow doesn't do full or partial indexes.
Apache Arrow supports Feather and Parquet on-disk file formats. Feather is on-disk Arrow IPC, now with default LZ4 compression or optionally ZSTD.
Some databases support Parquet as the database flat file format (that a DBMS process like PostgreSQL or MySQL provides a logged, permissioned, and cached query interface with query planning to).
IIUC with Parquet it's possible both to use normal tools to offline query data tables as files on disk and also to online query tables with a persistent process with tunable parameters and optionally also centrally enforce schema and referential integrity.
> Parquet format is designed for long-term storage, where Arrow is more intended for short term or ephemeral storage
> Parquet is more expensive to write than Feather as it features more layers of encoding and compression. Feather is unmodified raw columnar Arrow memory. We will probably add simple compression to Feather in the future.
> Due to dictionary encoding, RLE encoding, and data page compression, Parquet files will often be much smaller than Feather files
> Parquet is a standard storage format for analytics that's supported by many different systems: Spark, Hive, Impala, various AWS services, in future by BigQuery, etc. So if you are doing analytics, Parquet is a good option as a reference storage format for query by multiple systems
Those systems index Parquet. Can they also index Feather IPC, which an application might already have to journal and/or log, and checkpoint?
Edit: What are some of the DLT solutions for indexing given a consensus-controlled message spec designed for synchronization?
> This process of retrieving the expected database state from the WAL is called logical decoding and Postgres stores files related to this process in here.
While logical decoding is about WAL, it is not related to the recovery process. Logical decoding is a mechanism to convert the WAL entries back into the high-level operations that caused the WAL entries, for example for replication or audit.
hinkley 32 days ago [-]
But in a power loss the WAL is also read to restore and transactions that were in flight, right?
ggm 31 days ago [-]
"in case of disk loss, we wrote the things we were going to write to disk, to the WAL. which is also on disk"
I never entirely got it. Either your WAL is on more reliable media, or duplicated. If its just "easier" to write the WAL and faster to read off properly indexed state, ok, thats a local optimisation.
If your WAL is on the same filesystem behind a vendor specific RAID controller, you're still stuffed, if that RAID card dies.
Doxin 31 days ago [-]
the WAL is not for when your disk dies. It's for when the server crashes. First writing what you're going to do to the WAL before doing it makes it possible to recover from a crash at any point. If it crashes during writing to the WAL? Ignore the half-written WAL records and keep trucking. If it crashes during writing to the data file? Use the WAL to rerun the writes that didn't complete.
It doesn't guarantee you don't lose data written during the crash, but it does guarantee you can get the database back into a usable state.
drewsberry 32 days ago [-]
Very good point, I've rephrased this.
dfox 32 days ago [-]
You still refer to logical decoding as part of recovery in the last section. The main point is that WAL by itself is designed only for data recovery and only contains information about what is going to be written where in terms of essentially raw disk accesses (notably, the writes do not have to be page aligned).
Logical decoding (which needs wal_level=logical which extends the WAL format with additional metadata) is about parsing the WAL for other purposes than performing the recovery (or physical replication, which is essentially the same thing as recovery, but performed on another instance of the same cluster). The name "logical decoding" is certainly intended to emphasize that there are other uses for that than logical replication, but these are not that different from logical replication on this level (get a stream of changed tuples in tables).
jmholla 31 days ago [-]
Handy tip: instead of Python you can use `xxd` to inspect hex strings:
Just curious if anyone else encountered this same error from the initial "docker run" command:
docker: Error response from daemon: create ./pg-data: "./pg-data" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path.
drewsberry 31 days ago [-]
It's probably complaining about the relative path, try replacing `-v ./pg-data:/var/lib/postgresql/data` with `-v "$PWD/pg-data:/var/lib/postgresql/data"`
nemothekid 32 days ago [-]
When I started my dev career, NoSQL was the rage and I remember reading about BigTable, Cassandra, Dynamo, and most importantly LSMs. They made a big deal about how the data on stored on disk was sorted. I never knew why this was a big deal but always kept it in mind, but I never bothered to understand how it was done previously.
>Something really important about tables which isn’t obvious at first is that, even though they might have sequential primary keys, tables are not ordered.
This was very surprising to read.
Izkata 32 days ago [-]
I'mma pop up again with this, since it's not mentioned - there's a CLUSTER command that lets you reorder the table data to match an index. It's a one-off so you'll need to run it regularly from a crontab or something, but it's important to be aware of because postgres keeps a "correlation" statistic between the indexes and the order on disk. It affects the query planner, biasing it against random disk access and towards sequential disk access. It's one of the possible reasons postgres might not use an index that otherwise makes sense - the random disk access penalty is too high.
nraynaud 32 days ago [-]
A bit of curiosity: how did Postgres choose 8k pages?
shouldn’t it be the FS page size to help with atomicity?
silvestrov 32 days ago [-]
Depends very much on how the SSDs are designed internally. I think these days we have to settle for "can never be sure" of the real page size for atomicity. Pages can also become corrupt in other ways.
It is weird that "--data-checksums" isn't the default for new databases, even when it cost a bit in performance. Integrity should be more important than performance.
isosphere 32 days ago [-]
Was thinking the same thing when I saw those zeros in the checksum field. Perhaps the consequences are significant.
> Any application with a high shared buffers hit ratio: little difference.
> Any application with a high ratio of reads/writes: little difference.
> Data logging application with a low ratio of reads/inserts, and few updates and deletes:
little difference.
> Application with an equal ratio of reads/inserts, or many updates or deletes, and a low
shared buffers hit ratio (for example, an ETL workload), especially where the rows are
scattered among disk pages: expect double or greater CPU and disk I/O use.
> Run pg_dump on a database where all rows have already been previously selected by
applications: little difference.
> Run pg_dump on a database with large quantities of rows inserted to insert-only tables:
expect roughly double CPU and disk I/O use.
silvestrov 32 days ago [-]
On my M1 mac "dd ... | cksum" takes 3 seconds while "dd | shasum" (sha1) takes 2 seconds. So cksum might not be the best tool for performance checking.
There is CPU specific code in the PG source in src/include/storage/checksum_impl.h
It is written as a plain nested loop in C. So performance is fully dependent on the compiler being able to parallelize or vectorize the code.
I would not be surprised if manually written SIMD code would be faster.
anarazel 32 days ago [-]
The bottleneck isn't at all the checksum computation itself. It's that to keep checksums valid we need to protect against the potential of torn pages even in cases where it doesn't matter without checksums (i.e. were just individual bits are flipped). That in turn means we need to WAL log changes we don't need to without checksums - which can be painful.
Joe_Cool 32 days ago [-]
Interesting. I guess M1 doesn't have the 'crc32' "acceleration" that is included in SSE4.2.
It lacks page numbers. Page 5 is first page with gray box at the top of the page.
loeg 31 days ago [-]
That's measuring 'cksum', which must have an awfully slow implementation. The document notes that this is distinct from measuring PG's checksum performance. (I think it's a pretty useless measurement.)
Earlier (page 4):
> How much CPU time does it take to checksum...
> ...a specific amount of data? This is easy to estimate because PostgreSQL uses the crc32 algorithm which is very simple, and (GNU) Linux has a command line program that does the same thing: cksum.
Yeah, using cksum as an estimate here appears to be very flawed.
Joe_Cool 30 days ago [-]
That is weird. Seems like crc optimization is quite a rabbit hole.
Yeah. crc32 may be simple in theory, but doing it as fast as possible utilizing the various execution units of modern hardware is challenging.
tremon 32 days ago [-]
I'm not sure how much bearing internal storage organization should have on Postgres' page size. Since pg explicitly chooses not to implement their own storage organization layer, there's always a filesystem between a pg database and the underlying storage.
berkes 32 days ago [-]
> Integrity should be more important than performance.
Most often it is. But not always. There certainly are cases where speed is far more important than integrity in databases.
I cannot think of a case where this would be true for a RDBMS or even a Document DB (Though MongoDB had different opinions on this...).
But e.g. redis as caching server, or memcached, or even these non-normalized data that I have in a PG that can be reproduced from other sources easily in case of corruption or stale-ness: it's fine to trade in integrity for speed there.
speed_spread 32 days ago [-]
If physical integrity is already provided by the backing filesystem, such as ZFS, wouldn't --data-checksums be redundant?
magicalhippo 32 days ago [-]
If data is served from ARC, it's primary cache, ZFS does not perform a checksum check before handing it to you, as the data was checked when it got read into the ARC.
If you use ECC you're quite safe, but ECC can't detect multi-bit errors, just single and double bit errors.
So if you care much about your integrity, you might want Postgres to do its checksum check as well.
ikiris 32 days ago [-]
If you somehow have 3+ bit errors coming out of ram on an ECC board, you have much bigger problems than trying to verify your postgres data via checksum.
magicalhippo 32 days ago [-]
AFAIK RowHammer attacks can cause multiple bits to flip in a row[1], no?
But sure, it's not for the vast majority of folks.
8k is a very common page size, but 4k isn't unheard of. Oracle's default is 4k.
The issue is that page size caps row size (for on-row storage). Also, if you have a smart clustering index, larger pages can be more efficient use of index addressing. So it's a trade-off.
Historically there was no atomicity at 4k boundaries, just at 512 byte boundaries (sectors). That'd have been too limiting. Lowering the limit now would prove problematic due to the smaller row sizes/ lower number of columns.
cedws 32 days ago [-]
I guess it's a tradeoff. Ideally you'd want a small enough page size such that pages are unlikely to be split across multiple LBAs, but the performance wouldn't be good. Standard filesystems don't really guarantee LBA alignment anyway.
thomasben 31 days ago [-]
Easy to read, well written and well structured. Great work Drew !
shreddit 32 days ago [-]
Why does closing the table of contents open the nav menu on mobile?
ZoomerCretin 32 days ago [-]
On mobile, it appears to open regardless of where you tap. This appears to be the culprit:
```const n = document.getElementById('nav-header');
document.addEventListener(
'click',
s => {
u.hidden ||
s.target === null ||
n === null ||
n.contains(s.target) ||
r()
}```
Above, in the same function, there exists the function `e.addEventListener('click', r);`, which is likely closer to what the author intended. This fires the 'click' event any time the page is clicked, which opens the nav menu when it shouldn't.
drewsberry 31 days ago [-]
Thanks for letting me know, it was a typo in the "click-away closes nav menu" logic that opened it on the first click – all fixed now.
topherjaynes 32 days ago [-]
Honestly, I couldn't click this link quick enough and it delivered. Looking forward to the "oversized values" follow-up!
HPsquared 32 days ago [-]
Title missing the leading "How"
theblazehen 32 days ago [-]
I believe HN auto-strips such "filler" words, needs to be added again by a mod
andersa 32 days ago [-]
Perhaps it should not? Why would such a silly feature be explained over and over rather than removed?
candiddevmike 32 days ago [-]
HN does a lot of title editorializing. Like the one a few days ago that turned the acronym RATs into Rats and completely changed the meaning.
codetrotter 32 days ago [-]
Because there are a lot of articles in the news that add this word for no reason.
nandhinianand 32 days ago [-]
Ahhh.. wouldn't that be better by using a percentage of filler words/total words in title threshold?? I don't know if a feature simply strips out the filler words in title, it's not always useful and rather harmful is what I would argue.
shakna 32 days ago [-]
Humans work better. HN is small scale enough that a moderator can come along, collapse the off topic comments and fix the title, and it's not an issue.
DavidMankin 26 days ago [-]
Seems like title cleanup is a good use for an LLM actually. (Not to remove the human option, but as the first smarter pass instead of simple stopword lists.)
shakna 25 days ago [-]
Is it, though?
An LLM burns through how much power, compared to the tiny Arc server? You could outsource it, of course... But then you have a third party dependency.
Then you have the problem that LLMs continually and constantly change words to be "simpler" and destroy the very sensitive meanings, especially words in tech. Like auto-correcting Fediverse into Universe. Or C into C++.
codr7 32 days ago [-]
Yeah, especially when it changes the meaning in silly ways.
My brain spent quite some time trying to figure out what the headline was trying to say.
drewsberry 32 days ago [-]
I did add it into the title originally, must've been stripped. Thanks for the heads up, didn't know HN had this "useful feature".
eatonphil 32 days ago [-]
The submitter can edit the title after it has been auto-modified. It doesn't take a mod to do it unless some amount of time has passed.
drewsberry 32 days ago [-]
Can't see any way of modifying the title, unless I'm missing something – maybe I missed the edit window.
eatonphil 32 days ago [-]
Yeah, after an hour, you've probably missed it. (Source: I frequently edit titles of posts I submit to HN because HN changed it to nonsense. :) )
drewsberry 32 days ago [-]
Thanks for letting me know, I'll make sure to edit it quickly next time :-)
eitland 32 days ago [-]
IIRC, if the original submitter edits the title once it has been posted, the edited version sticks, i.e. the filter only works the first time and you can override it if you notice it.
32 days ago [-]
mharig 32 days ago [-]
> Can’t we just store some data on disk and read / write from it when we need to? (Spoiler: no.)
I disagree. SQLite does a good job in uniting the 2 worlds: complex SQL queries with excellent data consistency and simple file(s). Although SQLite is for sure not the one size fits all solution.
berkes 32 days ago [-]
> SQLite is for sure not the one size fits all solution
Nor is Postgres. PG is surprisingly versatile. E.g. with some extensions can be used as key-value storage (hashtable), document database, time-series db and so on. And it works quite well. Beyond "good enough" for many use cases. Added benefit, aside from having to run only one db-server, is that you can mix it: part relational, part document, etc.
But the PG versions nearly ever get as good as focused, dedicated solutions get. Which makes sense if you think about it: a team developing a dedicated key-value storage that does that and only that, for years, will always produce a better key-value storage then one bolted onto a generic RDBMS.
A practical example was where we used ltree extension to store ever growing hierarchies. We needed access control over subtrees (so that the X report for John only includes the entities of Johns devision and lower). While it worked in PG, it turned out that "simply replacing" it with OpenLDAP, which had all this built in, made it faster, easier and above all easier to maintain.
mharig 31 days ago [-]
[dead]
ozim 32 days ago [-]
But SQLite doesn’t do concurrency on writing you lock the file. While other db engines deal with row/table locks concurrent connections etc.
mharig 31 days ago [-]
[dead]
llimllib 32 days ago [-]
@drewsberry: I wish you had an RSS feed! I tried to subscribe to your blog but if there is one it's not linked.
(Enjoyed the post)
drewsberry 32 days ago [-]
Thanks for the feedback, I really appreciate it :-) I've added the RSS feed to my home page now, as the other poster noted the URL is https://drew.silcock.dev/rss.xml.
Next paragraph mentions TOAST and this byte is related to that. The low order bits (on little endian platforms) determine whether the value is stored inline (00, first 4 bytes are total length), is stored in TOAST table (11) or is shorter than 127 bytes (01 for even length, 10 for odd length, the total length is first byte >> 1). So for 0x25 you get 01, so length is 0x25 >> 1 = 18, which is that byte followed by "Equatorial Guinea".
Edit: the reason why endianness matters is that the same representation is also used in memory and the whole first word is interpreted as one length value. The toast tag bits have to be in first byte, which is most easily done as two highest order bits of that word on big endian. That means that it is placed in the two highest bits of the byte.
In this case the substring is part of the author's name. Such names are not at all uncommon.
You may have heard of him more recently with the Hexagonal Architecture approach.
Apparently might be Dang's UX and not against the Mod. ¯\_(ツ)_/¯
https://en.wikipedia.org/wiki/Corel_Linux
“The rumors are true!”
(Although less amusing, you could also just ask the IT guys and gals)
Is „tube“ on a blocklist as well?
[0]: https://blog.jcole.us/innodb/
> The Arrow columnar format includes a language-agnostic in-memory data structure specification, metadata serialization, and a protocol for serialization and generic data transport. This document is intended to provide adequate detail to create a new implementation of the columnar format without the aid of an existing implementation. We utilize Google’s Flatbuffers project for metadata serialization, so it will be necessary to refer to the project’s Flatbuffers protocol definition files while reading this document. The columnar format has some key features:
> Data adjacency for sequential access (scans)
> O(1) (constant-time) random access
> SIMD and vectorization-friendly
> Relocatable without “pointer swizzling”, allowing for true zero-copy access in shared memory
Are the major SQL file formats already SIMD optimized and zero-copy across TCP/IP?
Arrow doesn't do full or partial indexes.
Apache Arrow supports Feather and Parquet on-disk file formats. Feather is on-disk Arrow IPC, now with default LZ4 compression or optionally ZSTD.
Some databases support Parquet as the database flat file format (that a DBMS process like PostgreSQL or MySQL provides a logged, permissioned, and cached query interface with query planning to).
IIUC with Parquet it's possible both to use normal tools to offline query data tables as files on disk and also to online query tables with a persistent process with tunable parameters and optionally also centrally enforce schema and referential integrity.
From https://stackoverflow.com/questions/48083405/what-are-the-di... :
> Parquet format is designed for long-term storage, where Arrow is more intended for short term or ephemeral storage
> Parquet is more expensive to write than Feather as it features more layers of encoding and compression. Feather is unmodified raw columnar Arrow memory. We will probably add simple compression to Feather in the future.
> Due to dictionary encoding, RLE encoding, and data page compression, Parquet files will often be much smaller than Feather files
> Parquet is a standard storage format for analytics that's supported by many different systems: Spark, Hive, Impala, various AWS services, in future by BigQuery, etc. So if you are doing analytics, Parquet is a good option as a reference storage format for query by multiple systems
Those systems index Parquet. Can they also index Feather IPC, which an application might already have to journal and/or log, and checkpoint?
Edit: What are some of the DLT solutions for indexing given a consensus-controlled message spec designed for synchronization?
- cosmos/iavl: a Merkleized AVL+ tree (a balanced search tree with Merkle hashes and snapshots to prevent tampering and enable synchronization) https://github.com/cosmos/iavl/blob/master/docs/overview.md
- Google/trillion has Merkle hashed edges between rows in order in the table but is centralized
- "EVM Query Language: SQL-Like Language for Ethereum" (2024) https://news.ycombinator.com/item?id=41124567 : [...]
While logical decoding is about WAL, it is not related to the recovery process. Logical decoding is a mechanism to convert the WAL entries back into the high-level operations that caused the WAL entries, for example for replication or audit.
I never entirely got it. Either your WAL is on more reliable media, or duplicated. If its just "easier" to write the WAL and faster to read off properly indexed state, ok, thats a local optimisation.
If your WAL is on the same filesystem behind a vendor specific RAID controller, you're still stuffed, if that RAID card dies.
It doesn't guarantee you don't lose data written during the crash, but it does guarantee you can get the database back into a usable state.
Logical decoding (which needs wal_level=logical which extends the WAL format with additional metadata) is about parsing the WAL for other purposes than performing the recovery (or physical replication, which is essentially the same thing as recovery, but performed on another instance of the same cluster). The name "logical decoding" is certainly intended to emphasize that there are other uses for that than logical replication, but these are not that different from logical replication on this level (get a stream of changed tuples in tables).
docker: Error response from daemon: create ./pg-data: "./pg-data" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path.
>Something really important about tables which isn’t obvious at first is that, even though they might have sequential primary keys, tables are not ordered.
This was very surprising to read.
It is weird that "--data-checksums" isn't the default for new databases, even when it cost a bit in performance. Integrity should be more important than performance.
Here's a benchmarking exercise I found: https://www-staging.commandprompt.com/uploads/images/Command...
With a tidy summary:
> Any application with a high shared buffers hit ratio: little difference. > Any application with a high ratio of reads/writes: little difference. > Data logging application with a low ratio of reads/inserts, and few updates and deletes: little difference. > Application with an equal ratio of reads/inserts, or many updates or deletes, and a low shared buffers hit ratio (for example, an ETL workload), especially where the rows are scattered among disk pages: expect double or greater CPU and disk I/O use. > Run pg_dump on a database where all rows have already been previously selected by applications: little difference. > Run pg_dump on a database with large quantities of rows inserted to insert-only tables: expect roughly double CPU and disk I/O use.
There is CPU specific code in the PG source in src/include/storage/checksum_impl.h
It is written as a plain nested loop in C. So performance is fully dependent on the compiler being able to parallelize or vectorize the code.
I would not be surprised if manually written SIMD code would be faster.
https://dougallj.wordpress.com/2022/05/22/faster-crc32-on-th...
https://github.com/corsix/fast-crc32?tab=readme-ov-file#appl...
Something is amiss here.
If a CPU can do 30 GB/s then a CRC check should not have any real performance impact.
It lacks page numbers. Page 5 is first page with gray box at the top of the page.
Earlier (page 4):
> How much CPU time does it take to checksum...
> ...a specific amount of data? This is easy to estimate because PostgreSQL uses the crc32 algorithm which is very simple, and (GNU) Linux has a command line program that does the same thing: cksum.
Yeah, using cksum as an estimate here appears to be very flawed.
https://github.com/komrad36/CRC has a massive section about it in the README. Really interesting.
Most often it is. But not always. There certainly are cases where speed is far more important than integrity in databases. I cannot think of a case where this would be true for a RDBMS or even a Document DB (Though MongoDB had different opinions on this...).
But e.g. redis as caching server, or memcached, or even these non-normalized data that I have in a PG that can be reproduced from other sources easily in case of corruption or stale-ness: it's fine to trade in integrity for speed there.
If you use ECC you're quite safe, but ECC can't detect multi-bit errors, just single and double bit errors.
So if you care much about your integrity, you might want Postgres to do its checksum check as well.
But sure, it's not for the vast majority of folks.
[1]: https://www.vusec.net/projects/eccploit/
The issue is that page size caps row size (for on-row storage). Also, if you have a smart clustering index, larger pages can be more efficient use of index addressing. So it's a trade-off.
https://docs.oracle.com/en/database/oracle/oracle-database/1...
> Default value 8192
```const n = document.getElementById('nav-header');
Above, in the same function, there exists the function `e.addEventListener('click', r);`, which is likely closer to what the author intended. This fires the 'click' event any time the page is clicked, which opens the nav menu when it shouldn't.An LLM burns through how much power, compared to the tiny Arc server? You could outsource it, of course... But then you have a third party dependency.
Then you have the problem that LLMs continually and constantly change words to be "simpler" and destroy the very sensitive meanings, especially words in tech. Like auto-correcting Fediverse into Universe. Or C into C++.
My brain spent quite some time trying to figure out what the headline was trying to say.
I disagree. SQLite does a good job in uniting the 2 worlds: complex SQL queries with excellent data consistency and simple file(s). Although SQLite is for sure not the one size fits all solution.
Nor is Postgres. PG is surprisingly versatile. E.g. with some extensions can be used as key-value storage (hashtable), document database, time-series db and so on. And it works quite well. Beyond "good enough" for many use cases. Added benefit, aside from having to run only one db-server, is that you can mix it: part relational, part document, etc.
But the PG versions nearly ever get as good as focused, dedicated solutions get. Which makes sense if you think about it: a team developing a dedicated key-value storage that does that and only that, for years, will always produce a better key-value storage then one bolted onto a generic RDBMS.
A practical example was where we used ltree extension to store ever growing hierarchies. We needed access control over subtrees (so that the X report for John only includes the entities of Johns devision and lower). While it worked in PG, it turned out that "simply replacing" it with OpenLDAP, which had all this built in, made it faster, easier and above all easier to maintain.
(Enjoyed the post)