A related method. Not quite as straightforward as running with and without the failing test and comparing coverage reports. This technique goes through and collects many test runs and identifies lines only associated with or most often associated with failing runs.
drewcoo 6 hours ago [-]
I had no idea this had (or was worthy of) a name.
That's the whole point of coverage diffs.
The tough ones are the tests that sometimes fail and give you the same coverage results - the problem is not in the code under test! And the lazy/common things to do are re-run the test or add a sleep to make things "work."
anougaret 2 hours ago [-]
[dead]
godelski 1 hours ago [-]
Are you saying that LLMs will generate shitty code and the fix that by using your LLM? That seems... inconsistent...
anougaret 32 minutes ago [-]
we don't do the LLM part per say
we instrument your code automatically which is a compiler like approach under the hood, then we aggregate the traces
this allows context engineering the most exhaustive & informative prompt for LLMs to debug with
now if they still fail to debug at least we gave them all they should have needed
saagarjha 1 hours ago [-]
Why do you need to store a copy of my code to support what seems to be a time traveling debugger?
anougaret 34 minutes ago [-]
valid concerns of course
- we are planning a hosted AI debugging feature that can aggregate multiple traces & code snippets from different related codebases and feed it all into one llm prompt, that benefits a lot from having it all centralized on our servers
- for now the rewriting algorithms are quite unstable, it helps me debug it to have failing code files in sight
- we only store your code for 48hours as I assume it's completely unnecessary to store for longer
- a self hosted ver will be released for users that cannot accept this for valid reasons
Rendered at 03:54:31 GMT+0000 (Coordinated Universal Time) with Vercel.
A related method. Not quite as straightforward as running with and without the failing test and comparing coverage reports. This technique goes through and collects many test runs and identifies lines only associated with or most often associated with failing runs.
That's the whole point of coverage diffs.
The tough ones are the tests that sometimes fail and give you the same coverage results - the problem is not in the code under test! And the lazy/common things to do are re-run the test or add a sleep to make things "work."
we instrument your code automatically which is a compiler like approach under the hood, then we aggregate the traces
this allows context engineering the most exhaustive & informative prompt for LLMs to debug with
now if they still fail to debug at least we gave them all they should have needed
- we are planning a hosted AI debugging feature that can aggregate multiple traces & code snippets from different related codebases and feed it all into one llm prompt, that benefits a lot from having it all centralized on our servers
- for now the rewriting algorithms are quite unstable, it helps me debug it to have failing code files in sight
- we only store your code for 48hours as I assume it's completely unnecessary to store for longer
- a self hosted ver will be released for users that cannot accept this for valid reasons