Everything in this post stems from the assumption that you already know what you're doing, which is probably true for things you've built before. But I hope we can agree that you can't spec out something you have no clue how to build, let alone write the tests before you've even explored the boundaries of the problem space. That's completely unreasonable.
My second point is that this approach is fundamentally wrong for AI-first development. If the cost of writing code is approaching zero, there's no point investing resources to perfect a system in one shot. What matters more is how fast you can explore the edges. You can now spin up five agents to implement five different versions of the thing you're building and simply pick the best one.
In our shop, we have hundreds of agents working on various problems at any given time. Most of the code gets discarded. What we accept to merge are the good parts.
DaylitMagic 46 minutes ago [-]
If you don't mind the question with regard to your second point, couldn't what you've done in your shop be also used here? There's no reason why 'try to develop it five different ways and pick the best parts out of each' is incompatible with the 'VSDD' concept; seems like it could be included?
tikhonj 35 minutes ago [-]
> you can't spec out something you have no clue how to build
Ideally—and at least somewhat in practice—a specification language is as much a tool for design as it is for correctness. Writing the specification lets you explore the design space of your problem quickly with feedback from the specification language itself, even before you get to implementing anything. A high-level spec lets you pin down which properties of the system actually matter, automatically finds an inconsistencies and forces you to resolve them explicitly. (This is especially important for using AI because an AI model will silently resolve inconsistencies in ways that don't always make sense but are also easy to miss!)
Then, when you do start implementing the system and inevitably find issues you missed, the specification language gives you a clear place to update your design to match your understanding. You get a concrete artifact that captures your understanding of the problem and the solution, and you can use that to keep the overall complexity of the system from getting beyond practical human comprehension.
A key insight is that formal specification absolutely does not have to be a totally up-front tool. If anything, it's a tool that makes iterating on the design of the system easier.
Traditionally, formal specification have been hard to use as design tools partly because of incidental complexity in the spec systems themselves, but mostly because of the overhead needed to not only implement the spec but also maintain a connection between the spec and the implementation. The tools that have been practical outside of specific niches are the ones that solve this connection problem. Type systems are a lightweight sort of formal verification, and the reason they took off more than other approaches is that typechecking automatically maintains the connection between the types and the rest of the code.
LLMs help smooth out the learning curve for using specification languages, and make it much easier to generate and check that implementations match the spec. There are still a lot of rough edges to work out but, to me, this absolutely seems to be the most promising direction for AI-supported system design and development in the future.
politician 46 minutes ago [-]
"Most of the code gets discarded." If you don't mind sharing, what's your signal-to-token ratio?
DaylitMagic 49 minutes ago [-]
Some random (hopefully additive and helpful) thoughts:
Many companies have older code bases / databases that can be somewhat well defined (and somewhat not). If things have been slowly iterating over 35 years, there's a lot of undocumented edge behavior that may occur; it may be beneficial to have a step before Edge Case Catalog where there's some kind of prompting to catalogue how the inputs and outputs work, and then find the different inputs and outputs - and then confirm that with Input A and Output A that it works as expected. (Legacy systems often have weird orchestration that nobody remembers.)
(Sub-note: This is somewhat part of the provable properties catalog; while this step could be placed there, it would require a re-run of edge case catalog build potentially, which isn't a bad thing.)
A small note that I personally think is a good idea is better code commenting than has been outlined here - the spec itself should be woven into the code with potentially slightly over-commenting for each aspect, code spec gets lost. The code itself should serve as context, especially in the TDD stage.
I think it's implicit but may be worth overtly stating that for the Code Quality check in Phase 3 that it also checks on a zero-trust basis, and doesn't include things like hardcoded keys.
I'm not sure what Chainlink is (sorry!) but I like the ideas outlined around the decomposition - but it misses stringing everything together end-to-end in the way outlined here (it asks to create each part, but never actually weaves the whole together).
Something not covered - is sequencing work and decomposition of work. A spec can create multiple dependencies within itself, requiring things to be worked on in a specific order.
Claude or something different... there is life beyond Claude I assure you and it is quite good and colourful.
politician 1 hours ago [-]
This is a decent approach. My concern with TDD is that writing tests necessarily implies designing an API upon which those tests operate. Here, the agent is instructed to "not write code, write tests", and yet, in doing so it defines an API. This will cause the AI to hallucinate the API. Layering in yet more tests on top of this will cause that API to deform in strange ways that pass tests but that the adversary will not be able to cope with because it runs too late in the VSDD process.
I've seen this exact process play out in my own work. The AI generates code and tests that pass with high code coverage and honors invariants set by spec. I look at the code and find a rats nest / ball of mud that will cost 10x more tokens to enhance should I ever need to add a feature.
So, I think you're on to something, but I think the process might be discounting extensibility and resilience under change.
NeutralForest 1 hours ago [-]
> My concern with TDD is that writing tests necessarily implies designing an API upon which those tests operate
It really forces you to do outside-in testing; I usually describe the kind of API I want when chatting with the agents. For example, the CLI options, the routes that might be useful for an API, etc.
> I look at the code and find a rats nest / ball of mud that will cost 10x more tokens to enhance should I ever need to add a feature.
Agreed, I don't know if there are good forcing functions to avoid complexity. The providers have a huge incentive to have you waste your tokens (for example when it re-outputs the complete file when you ask for a tiny change).
bonesss 28 minutes ago [-]
“Test Driven Design” is another way to frame it.
DaylitMagic 53 minutes ago [-]
I see what you're saying. Modularity and interfaces are really important between the different aspects of what's being developed. And it is worth putting time into the question "if another will use this, what would they potentially use it for, and why?". It doesn't mean that it needs to be built now - but considering that and ensuring that the planned code executes against that is a good strategy.
galoisscobi 36 minutes ago [-]
I think this word salad doesn’t have enough buzzwords. Throw in a few more acronyms too.
Rendered at 18:44:11 GMT+0000 (Coordinated Universal Time) with Vercel.
My second point is that this approach is fundamentally wrong for AI-first development. If the cost of writing code is approaching zero, there's no point investing resources to perfect a system in one shot. What matters more is how fast you can explore the edges. You can now spin up five agents to implement five different versions of the thing you're building and simply pick the best one.
In our shop, we have hundreds of agents working on various problems at any given time. Most of the code gets discarded. What we accept to merge are the good parts.
Ideally—and at least somewhat in practice—a specification language is as much a tool for design as it is for correctness. Writing the specification lets you explore the design space of your problem quickly with feedback from the specification language itself, even before you get to implementing anything. A high-level spec lets you pin down which properties of the system actually matter, automatically finds an inconsistencies and forces you to resolve them explicitly. (This is especially important for using AI because an AI model will silently resolve inconsistencies in ways that don't always make sense but are also easy to miss!)
Then, when you do start implementing the system and inevitably find issues you missed, the specification language gives you a clear place to update your design to match your understanding. You get a concrete artifact that captures your understanding of the problem and the solution, and you can use that to keep the overall complexity of the system from getting beyond practical human comprehension.
A key insight is that formal specification absolutely does not have to be a totally up-front tool. If anything, it's a tool that makes iterating on the design of the system easier.
Traditionally, formal specification have been hard to use as design tools partly because of incidental complexity in the spec systems themselves, but mostly because of the overhead needed to not only implement the spec but also maintain a connection between the spec and the implementation. The tools that have been practical outside of specific niches are the ones that solve this connection problem. Type systems are a lightweight sort of formal verification, and the reason they took off more than other approaches is that typechecking automatically maintains the connection between the types and the rest of the code.
LLMs help smooth out the learning curve for using specification languages, and make it much easier to generate and check that implementations match the spec. There are still a lot of rough edges to work out but, to me, this absolutely seems to be the most promising direction for AI-supported system design and development in the future.
Many companies have older code bases / databases that can be somewhat well defined (and somewhat not). If things have been slowly iterating over 35 years, there's a lot of undocumented edge behavior that may occur; it may be beneficial to have a step before Edge Case Catalog where there's some kind of prompting to catalogue how the inputs and outputs work, and then find the different inputs and outputs - and then confirm that with Input A and Output A that it works as expected. (Legacy systems often have weird orchestration that nobody remembers.)
(Sub-note: This is somewhat part of the provable properties catalog; while this step could be placed there, it would require a re-run of edge case catalog build potentially, which isn't a bad thing.)
A small note that I personally think is a good idea is better code commenting than has been outlined here - the spec itself should be woven into the code with potentially slightly over-commenting for each aspect, code spec gets lost. The code itself should serve as context, especially in the TDD stage.
I think it's implicit but may be worth overtly stating that for the Code Quality check in Phase 3 that it also checks on a zero-trust basis, and doesn't include things like hardcoded keys.
I'm not sure what Chainlink is (sorry!) but I like the ideas outlined around the decomposition - but it misses stringing everything together end-to-end in the way outlined here (it asks to create each part, but never actually weaves the whole together).
Something not covered - is sequencing work and decomposition of work. A spec can create multiple dependencies within itself, requiring things to be worked on in a specific order.
I've seen this exact process play out in my own work. The AI generates code and tests that pass with high code coverage and honors invariants set by spec. I look at the code and find a rats nest / ball of mud that will cost 10x more tokens to enhance should I ever need to add a feature.
So, I think you're on to something, but I think the process might be discounting extensibility and resilience under change.
It really forces you to do outside-in testing; I usually describe the kind of API I want when chatting with the agents. For example, the CLI options, the routes that might be useful for an API, etc.
> I look at the code and find a rats nest / ball of mud that will cost 10x more tokens to enhance should I ever need to add a feature.
Agreed, I don't know if there are good forcing functions to avoid complexity. The providers have a huge incentive to have you waste your tokens (for example when it re-outputs the complete file when you ask for a tiny change).