Neat! I worked on a similar formal verification of Ghostferry, which is a zero downtime data migration tool that powers the shard balancing tool at Shopify, also using TLA+:
I also was able to find an concrrency bug before a single line of code was written with the TLC which saved a lot of time. It took about 4 weeks to design and verify the system in spec and about 2 weeks to write the initial code version, which mostly survived to this day and reasonably resembles the TLA+ spec. To my knowledge (I no longer work there) the correctness of the system was never violated and it never had any sort of data corruption. Would be a much harder feat without TLA+.
agentultra 3 days ago [-]
I recall learning TLA+ around 2015/2016 to solve a hard bug in an OpenStack deployment. I was miffed that I was some 17 years into my career and had only learned about it then. It was so useful!
Since then I've used it for similar purposes as this. I try to share with my team when I do this kind of work. But often folks are highly resistant to it so I use it on my own when I'm working on a hard project.
Spending the time up-front to work out the design is often more cost effective in terms of time and money than iterating in production towards a, "more correct design." Getting it right first sounds hard, and it's challenging, but it's worth it for the kinds of projects where mistakes are costly, difficult to diagnose and to fix.
And I think more software developers are starting to realize this [0] (even if they're using different methods).
This looks like a great example, I'll try to find some time to write a version of it in Quint. I have mentioned DB migration as an example of two phase commit usage before, but never as a standalone spec like this. It's definitely the kind of problem that made me anxious in the past, which means it's a good fit for formal verification :)
asah 3 days ago [-]
for fun, I popped it into ChatGPT o3-mini-high, had it generate the TLA+ code then compare its version to the human-written version:
- It says that `olddb = [k \in Keys |-> CHOOSE v \in Values: TRUE]` initializes olddb nondeterministically. This is NOT true: `CHOOSE` is guaranteed to be a deterministic expression. The actual thing is `olddb \in [Keys -> Values]`.
- `key \in Keys; val \in Values; newdb[key] := val;` is a syntax error. I think what it meant was `with key \in Keys, val \in Values { newdb[key] := val;`
- `newdb[key] := val` is an error anyway, because `newdb` was initialized as a set.
- ` for key \in DOMAIN(olddb) do` is straight up a syntax error, there's no for loop in PlusCal.
- It's mixing p-syntax (`if then end if`) with c-syntax (`if {}`).
code_monk666 3 days ago [-]
When I wrote this, I had tried getting help from several different chat models. None of them produced anything remotely useful—but they did help me learn formal verification, as well as syntax when writing in PlusCal
Rendered at 10:35:36 GMT+0000 (Coordinated Universal Time) with Vercel.
https://github.com/Shopify/ghostferry/blob/main/tlaplus/ghos...
I also was able to find an concrrency bug before a single line of code was written with the TLC which saved a lot of time. It took about 4 weeks to design and verify the system in spec and about 2 weeks to write the initial code version, which mostly survived to this day and reasonably resembles the TLA+ spec. To my knowledge (I no longer work there) the correctness of the system was never violated and it never had any sort of data corruption. Would be a much harder feat without TLA+.
Since then I've used it for similar purposes as this. I try to share with my team when I do this kind of work. But often folks are highly resistant to it so I use it on my own when I'm working on a hard project.
Spending the time up-front to work out the design is often more cost effective in terms of time and money than iterating in production towards a, "more correct design." Getting it right first sounds hard, and it's challenging, but it's worth it for the kinds of projects where mistakes are costly, difficult to diagnose and to fix.
And I think more software developers are starting to realize this [0] (even if they're using different methods).
[0]: https://www.youtube.com/watch?v=w3WYdYyjek4&t=1333s
Update: grammar/spelling
This looks like a great example, I'll try to find some time to write a version of it in Quint. I have mentioned DB migration as an example of two phase commit usage before, but never as a standalone spec like this. It's definitely the kind of problem that made me anxious in the past, which means it's a good fit for formal verification :)
https://chatgpt.com/share/67d0390c-4208-8007-a39d-8d9bfa7886...
- It says that `olddb = [k \in Keys |-> CHOOSE v \in Values: TRUE]` initializes olddb nondeterministically. This is NOT true: `CHOOSE` is guaranteed to be a deterministic expression. The actual thing is `olddb \in [Keys -> Values]`.
- `key \in Keys; val \in Values; newdb[key] := val;` is a syntax error. I think what it meant was `with key \in Keys, val \in Values { newdb[key] := val;`
- `newdb[key] := val` is an error anyway, because `newdb` was initialized as a set.
- ` for key \in DOMAIN(olddb) do` is straight up a syntax error, there's no for loop in PlusCal.
- It's mixing p-syntax (`if then end if`) with c-syntax (`if {}`).