So I gave goose a whirl and I actually really like the approach they are taking, especially because I use emacs and not vscode. I would recommend people try it out on an existing project—the results are quite good for small, additive features and even ones that are full stack.
Which LLM did you use with Goose? That really affects the quality of the outcome
alexkehayias 369 days ago [-]
I’m using gpt-4o which I think is the default for the OpenAI configuration.
Haven’t tried other models yet but old like to see how o3-mini performs once it’s been added.
alexjhancock 369 days ago [-]
Hi - Alex here from the Goose team. We do already support o3-mini. If you're using the OpenAI provider you can enter that model in the settings screen of the app. Let me know if you have any trouble.
Cursor also just got support this week. Overall it’s still early (MCP only came out a couple of months ago) but seeing multiple clients that allow it to be used with non-Anthropic models, and getting good results, makes me bullish for MCP.
My colleague has working on an MCP server that runs Python code in a sandbox (through https://forevervm.com, shameless plug). I’ve been using Goose a lot since it was announced last week for testing and it’s rough in some spots but the results have been good.
XorNot 369 days ago [-]
I don't know how useful this is, but my immediate reaction to the animation on the front page was "that's literally worse then the alternative".
Because the example given was "change the color of a component".
Now, it's obviously fairly impressive that a machine can go from plain text to identifying a react component and editing it...but the process to do so literally doesn't save me any time.
"Can you change the current colour of headercomponent.tsx to <some color> and increase the size vertical to 15% of vh" is a longer to type sentence then the time it would take to just open the file and do that.
Moreover, the example is in a very "standard" format. What happens if I'm not using styled components? What happens if that color is set from a function? In fact none of the examples shown seem gamechanging in anyway (i.e. the Confluence example is also what a basic script could do, or a workflow, or anything else - and is still essentially "two mouseclicks" rather then writing out a longer English sentence and then I would guess, waiting substantially more time for inferrence to run.
taneq 369 days ago [-]
On the one hand, this isn’t a great example for you because you already knew how to do that. There’s probably no good way to automate trivial changes that you can make off the top of your head, and have it be faster than just doing it yourself.
I’ve found LLMs most useful for doing things with unfamiliar tooling, where you know what you want to achieve but not exactly how to do it.
On the other hand, it’s an okay test case because you can easily verify the results.
robertwt7 369 days ago [-]
I agree with the process not saving any of our time. however aren't examples supposed to be simple?
Take it from Aider example: https://github.com/Aider-AI/aider
It asked to add a param and typing to a function. Would that save us more time? I don't think so. but it's a good peek of what it can do
just like any other hello world example i suppose
two_handfuls 369 days ago [-]
Examples are supposed to be simple when they illustrate a process we already know works.
With AI the challenge is that we need to convince the reader that the tool will work. So that calls for a different kind of example.
throwaway290 369 days ago [-]
If you don't know how implement it how can you be sure LLM will do it correctly?
If the task is not simple then break it into simple tasks. Then each of them is as easy as color change.
two_handfuls 368 days ago [-]
Not how it works. That it works.
throwaway290 368 days ago [-]
No, how it works.
pjm331 369 days ago [-]
Yeah the fact that just composing the first prompt would take me longer than just doing the thing is my biggest blocker to using any of these tools on a regular basis
ehnto 369 days ago [-]
Which is also assuming it gets it right the first prompt, and not 15 minutes of prompt hacking later, giving up and doing it the old fashioned way anyway.
The risk of wasted time is higher than the proposed benefit, for most of my current use cases. I don't do heaps of glue code, it's mostly business logic, and one off fixes, so I have not found LLMs to be useful day to day at work.
Where it has been useful is when I need to do a task with tech I don't use often. I usually know exactly what I want to do but don't have the myriad arcane details. A great example would be needing to do a complex MongoDB query when I don't normally use Mongo.
oxidant 369 days ago [-]
Cursor + Sonnet has been great for scaffolding tests.
I'll stub out tests (just a name and `assert true`) and have it fill them in. It usually gets them wrong, but I can fix one and then have it update the rest to match.
Not perfect, but beats writing all the tests myself.
Keyframe 369 days ago [-]
I don't know if anyone find this useful, but it seems rather useless / not working? I tried with numerous online and local llms for good measure. I installed that computerController extension and tried couple of dozens different versions of open a website (url) in a browser and save a screenshot. Most of the time it wouldn't even open a website, and I never got a screenshot. At best it did open a website once and saved a html (even though I asked a screenshot); and that one was unique in a bunch when it did something instead of complaining it can't find AppleScript or whatever on a linux machine.. I qualified the ask by telling it it's on linux. It managed to find which distro it was on even. Really weird overall.
DrewHintz 369 days ago [-]
I’ve had luck with a workflow similar to:
git clone a repo
Open goose with that directory
Instruct it to discover what the repo does
Ask it to make changes to the code, being detailed with my instructions.
I haven't tried computerController, only Goose’s main functionality.
pzo 369 days ago [-]
This looks very promising. I only played a little bit yesterday but they really need to polish the UI. Comparing to desktop version of chatgpt or perplexity they are in much lower league. Some feedback for team:
1) use better font and size
2) allow to adjust shortcuts and have nice defaults with easy change
3) integrate with local whisper model so I can type with voice triggered with global shortcut
4) change background to blend with default system OS theme so we don't have useless ugly top bar and ugly bottom bar
5) shortcuts buttons to easily copy part of conversion or full conversation, activate web search, star conversion so easy to find in history etc.
They should get more inspiration from raycast/perlexity/chatgpt/arcbrowser/warpai ui/cursor
So, we're supposed to rely on LLM not hallucinating that it is allowed to do what it wants?
ramesh31 368 days ago [-]
>So, we're supposed to rely on LLM not hallucinating that it is allowed to do what it wants?
Yes. Frontier models have been moving at light speed over the last year. Hallucinations are almost completely solved, particularly with Anthropic models.
It won't be long before statements like this sound the same as "so you mean I have to trust that my client will always have a stable internet connection to reach out to this remote server for data?".
feznyng 368 days ago [-]
This is missing the human language ambiguity problem. If you don't perfectly specify your requirements and it misinterprets what you're asking for that's going to be a problem regardless of how smart it is. This is fine with code editing since you've got version control and not so great when running commands in your terminal that can't be as trivially reverted.
Hallucination might be getting better, gullibility less so.
threecheese 368 days ago [-]
As a regular Claude user, incorrectness is not anywhere near solved; it may be domain-dependent, but just yesterday 3.5 invented a Mac CLI tool that does not exist (and would’ve been pretty useful if it had). I cannot take anything factual at face value, which is actually OK as long as net/net I’m still more productive.
register 369 days ago [-]
But... how does it work? The documentation is really confusing . How to make it aware of code file and project structure?
DrewHintz 369 days ago [-]
I tell it to discover that itself by asking leading questions:
“What does this repo do?”
“How do you run its unit tests?”
“What does file foo do?”
bckr 369 days ago [-]
Today I decided that what I need is:
- prompt from command line directly to Claude
- suggestions dumped into a file under ./tmp/ (ignored by git)
- iterate on those files
- shuttle test results over to Claude
Getting those files merged with the source files is also important, but I’m not confident in a better way than copy-pasting at this point.
lordswork 369 days ago [-]
I'm building this for myself because I want it too. I recommend you do the same, because it's been really fun to build and teaches you a lot about what LLMs can and cannot do well.
I plan to share it on Github, but waiting for my employer's internal review process to let me open source it as my own project, since they can legally claim all IP developed by me while employed there.
horsawlarway 369 days ago [-]
I also built this for myself, and I really do suggest it as a good project to get a grounded idea of what models can handle.
Mainly - tool calling support just merged in llama.cpp (https://github.com/ggerganov/llama.cpp/pull/9639) this week, and it's been a fun exercise to put local LLMs through the wringer to see how they do at handling it.
It's been a mixture of "surprisingly well" and "really badly".
vessenes 369 days ago [-]
Aider is fantastic. Worth a look.
bckr 369 days ago [-]
I’ve been playing with it and I don’t like it that much? I’m not sure why. It feels a little buggy and like it’s doing too much.
vessenes 369 days ago [-]
Interesting. I occasionally feel that way with Claude as the backend. Still the best backend that’s reliable, although o3-mini-high in architecture mode with Claude is very good.
I find Claude wants to edit files that I don’t like to be edited often. Two ways I deal with that - first, you can import ‘read only’ files, which is super helpful for focusing. Second, you can use a chat mode first to talk over plans, and when you’re happy say “go”.
I think the thing to do is try and use it at a fairly high level, then drop down and critique. Essentially use it as a team of impressive motivation and mid quality.
yoyohello13 369 days ago [-]
You’re being downvoted for some reason but I feel the same. It’s cool tech but I’ve found I often need to revert changes. It’s far too aggressive with tweaking files. Maybe I can adjust in the settings, idk. Also, it’s expensive as hell to run with Claude sonnet. Cost me like $0.01 per action on a small project, insane. At this point I still prefer the chat interface.
extr 369 days ago [-]
You can basically get the same experience as aider with an MCP server like https://github.com/rusiaaman/wcgw. It's not perfect - sometimes has trouble with exact syntax of find/replace. But it's free to use up to your regular Claude subscription usage limit. I actually use it more than Cursor, because it's easier to flip back and forth between architecting/editing.
yoyohello13 369 days ago [-]
Thanks! I’ll take a look at this. It always kind of annoyed me to pay for API credits on top of a subscription, lol.
M4v3R 369 days ago [-]
$0.01 per action that can potentially save you tens of minutes to hours of work sounds like a pretty good deal to me, if I compare this to my hourly wage.
sangnoir 369 days ago [-]
> $0.01 per action that can potentially save you tens of minutes to hours of work sounds like a pretty good deal to me
Save tens of hours in one commit?! No model is this good yet[1] - especially not Aider with its recommended models. I fully agree with parent - current SoTA models require lots of handholding in the domains I care about, and the AI chat/pairing setup works much better compared to the AI creating entire commits/PRs before a human gets to look at it.
1. If they were, Zuckerberg would have already announced another round of layoffs.
joshstrange 369 days ago [-]
Ten hours in one commit? Nope, not yet, but it works great to test out ideas and get unstuck when trying to decide how to proceed. Instead of having to choose, just do both or at least try 1 right away instead of bike-shedding.
I often get hung up on UI, I can’t make a decision on what I think will look decent and so I just sort of lock up. Aider lets me focus on the logic and then have the LLM spit out the UI. Since I’ve given up on projects before due to the UI aspect (lose interest because I don’t feel like I’m making progress, or get overwhelmed by all the UI I’ll need to write) this is a huge boon to me.
I’m not incapable of writing UI, I’m just slower at it so Aider is like having a wiz junior developer who can crank out UI when I need it. I’m even fine to rewrite every line of the UI by hand before “shipping”, the LLM just helps me not get stuck on what it should look like. It lets me focus on the feature.
joshstrange 369 days ago [-]
$0.01 per action? Yeah, and I’ve gotten up to 10 cents or so I think in a “single” action but so what? The most I’ve ever spent with it in one go has been like $5 for a couple hours (maybe even 6-8) of on and off usage. That $5 was completely worth it.
Also you can use local models if you want it to be “free”.
yoyohello13 369 days ago [-]
A running total of the "cost of my side project" doesn't feel particularly good.
joshstrange 368 days ago [-]
I understand that, I sympathize with that. I hate being “metered” and even low cost or high limits doesn’t fully remove that annoyance.
The thing is, all side projects have a cost, even if it’s just time. I’m happy to let some of my side projects move forward faster if it means paying a small amount of money.
anonzzzies 369 days ago [-]
There are many (ignored) requests, to, like cursor, copilot and cline, automatically pick the files without having to specify them. Not having that makes it much worse than those others. I was a fan before the others but having to add your files is not a thing anymore.
bckr 369 days ago [-]
Hmm, I want to add my own files. This is because in my workflow I often turn to the web UI in order to get a fresh context.
I do like the idea of letting the model ask for source code.
It’s all about attention / context.
anonzzzies 369 days ago [-]
But one does not exclude the other; some like one some like the other. I am used to Cline now and it's pretty good at picking the correct files, however, I get better results out of aider once the files are in.
bckr 369 days ago [-]
I’ve almost finished an interactive file selector inspired by git add interactive, with the addition of a tree display.
I’m giving myself the option to output collated code to a file, or copy it to clipboard, or just hold onto it for the next prompt.
I know aider does this stuff, but because I’m automating my own workflow, it’s worth doing it myself.
dbdoskey 369 days ago [-]
You should try cline. I found it with Anthropic to be invaluable.
bckr 368 days ago [-]
I may give it a spin after finishing this project :)
bArray 369 days ago [-]
Running locally is such an important feature, running elsewhere is an instant non-starter for me. I also want the LLM to be able to read the code to build an in-context solution, but not be able to make changes unless they are strictly accepted.
93po 369 days ago [-]
I'm confused what this does that Cursor doesn't. The example it shows on the front page is something Cursor can also easily do.
lordswork 369 days ago [-]
The one that stands out most to me is that it doesn't bundle the AI features with the editor. This is a standalone application that runs on the side of your editor/tools of choice.
netfl0 369 days ago [-]
Open source licensing where the limit is on the ai-backend right?
betimsl 369 days ago [-]
Does anyone know alternatives to this? GUI Frontend for various AI providers? (incl. ollama, etc)
aantix 369 days ago [-]
How does this compare to Cline or Cursor's Composer agent?
esafak 369 days ago [-]
Are people finding agent frameworks useful or they are unnecessary dependencies like Langchain?
goldenManatee 369 days ago [-]
Regarding the “Extensible,” doesn’t that completely moot its whole point?
letniq 369 days ago [-]
Is it the same as ollama?
juunpp 369 days ago [-]
It advertises that it runs locally and that it is "extensible" but then requires you to set up a remote/external provider as the first step of installation? That's a rather weird use of "local" and "extensible". Do words mean anything anymore?
raincole 369 days ago [-]
You went as far as checking how it works (thus "requires you to set up a remote/external provider as the first step").
But you didn't bother checking the very next section on side bar, Supported LLM Providers, where ollama is listed.
The attention span issue today is amusing.
anonzzzies 369 days ago [-]
> The attention span issue today is amusing.
I find it rather depressing. I know it's a more complex thing, but it really feels irl like people have no time for anything past a few seconds before moving onto the next thing. Shows in the results of their work too often as well. Some programming requires very long attention span and if you don't have any, it's not going to be good.
unification_fan 368 days ago [-]
But this is an elevator pitch. I didn't come here to be marketed to, yet I am being marketed to.
So if you're going to market something to me at least do it right. My attention span is low because I don't really give a shit about this.
EVa5I7bHFq9mnYK 369 days ago [-]
But people really have no time. There is only one brain and thousands of AI startups pitching something every day.
anonzzzies 369 days ago [-]
Yeah, don't need to try any until everyone says 'you have to'. Which happened with Aider and later Cline & Cursor.
368 days ago [-]
tonygiorgio 369 days ago [-]
Can’t you just run ollama and provide it a localhost endpoint? I dont think its within scope to reproduce the whole local LLM stack when anyone wanting to do this today can easily use existing better tools to solve that part of it.
369 days ago [-]
demarq 369 days ago [-]
Did you not see Ollama?
hiyer 369 days ago [-]
You can use it with ollama too
kylecazar 369 days ago [-]
Yeah, they seem to be referring to the Goose agent/CLI that are local. Not models themselves.
anonzzzies 369 days ago [-]
You can run ollama, so no, not only Goose itself.
kylecazar 368 days ago [-]
Fair, but the repeated references to local/on-machine on the project's homepage which OP criticized is, I would think, in reference to the Goose agent.
Rendered at 06:00:57 GMT+0000 (Coordinated Universal Time) with Vercel.
Here's a short writeup of my notes from trying to use it https://notes.alexkehayias.com/goose-coding-ai-agent/
Haven’t tried other models yet but old like to see how o3-mini performs once it’s been added.
My colleague has working on an MCP server that runs Python code in a sandbox (through https://forevervm.com, shameless plug). I’ve been using Goose a lot since it was announced last week for testing and it’s rough in some spots but the results have been good.
Because the example given was "change the color of a component".
Now, it's obviously fairly impressive that a machine can go from plain text to identifying a react component and editing it...but the process to do so literally doesn't save me any time.
"Can you change the current colour of headercomponent.tsx to <some color> and increase the size vertical to 15% of vh" is a longer to type sentence then the time it would take to just open the file and do that.
Moreover, the example is in a very "standard" format. What happens if I'm not using styled components? What happens if that color is set from a function? In fact none of the examples shown seem gamechanging in anyway (i.e. the Confluence example is also what a basic script could do, or a workflow, or anything else - and is still essentially "two mouseclicks" rather then writing out a longer English sentence and then I would guess, waiting substantially more time for inferrence to run.
I’ve found LLMs most useful for doing things with unfamiliar tooling, where you know what you want to achieve but not exactly how to do it.
On the other hand, it’s an okay test case because you can easily verify the results.
Take it from Aider example: https://github.com/Aider-AI/aider It asked to add a param and typing to a function. Would that save us more time? I don't think so. but it's a good peek of what it can do
just like any other hello world example i suppose
With AI the challenge is that we need to convince the reader that the tool will work. So that calls for a different kind of example.
If the task is not simple then break it into simple tasks. Then each of them is as easy as color change.
The risk of wasted time is higher than the proposed benefit, for most of my current use cases. I don't do heaps of glue code, it's mostly business logic, and one off fixes, so I have not found LLMs to be useful day to day at work.
Where it has been useful is when I need to do a task with tech I don't use often. I usually know exactly what I want to do but don't have the myriad arcane details. A great example would be needing to do a complex MongoDB query when I don't normally use Mongo.
I'll stub out tests (just a name and `assert true`) and have it fill them in. It usually gets them wrong, but I can fix one and then have it update the rest to match.
Not perfect, but beats writing all the tests myself.
git clone a repo
Open goose with that directory
Instruct it to discover what the repo does
Ask it to make changes to the code, being detailed with my instructions.
I haven't tried computerController, only Goose’s main functionality.
1) use better font and size
2) allow to adjust shortcuts and have nice defaults with easy change
3) integrate with local whisper model so I can type with voice triggered with global shortcut
4) change background to blend with default system OS theme so we don't have useless ugly top bar and ugly bottom bar
5) shortcuts buttons to easily copy part of conversion or full conversation, activate web search, star conversion so easy to find in history etc.
They should get more inspiration from raycast/perlexity/chatgpt/arcbrowser/warpai ui/cursor
https://block.github.io/goose/docs/goose-architecture/
> Make sure to confirm all changes with me before applying.
https://block.github.io/goose/docs/guides/using-goosehints
So, we're supposed to rely on LLM not hallucinating that it is allowed to do what it wants?
Yes. Frontier models have been moving at light speed over the last year. Hallucinations are almost completely solved, particularly with Anthropic models.
It won't be long before statements like this sound the same as "so you mean I have to trust that my client will always have a stable internet connection to reach out to this remote server for data?".
Besides that, you can absolutely still trick top of the line models: https://embracethered.com/blog/posts/2024/claude-computer-us...
Hallucination might be getting better, gullibility less so.
- prompt from command line directly to Claude
- suggestions dumped into a file under ./tmp/ (ignored by git)
- iterate on those files
- shuttle test results over to Claude
Getting those files merged with the source files is also important, but I’m not confident in a better way than copy-pasting at this point.
I plan to share it on Github, but waiting for my employer's internal review process to let me open source it as my own project, since they can legally claim all IP developed by me while employed there.
Mainly - tool calling support just merged in llama.cpp (https://github.com/ggerganov/llama.cpp/pull/9639) this week, and it's been a fun exercise to put local LLMs through the wringer to see how they do at handling it.
It's been a mixture of "surprisingly well" and "really badly".
I find Claude wants to edit files that I don’t like to be edited often. Two ways I deal with that - first, you can import ‘read only’ files, which is super helpful for focusing. Second, you can use a chat mode first to talk over plans, and when you’re happy say “go”.
I think the thing to do is try and use it at a fairly high level, then drop down and critique. Essentially use it as a team of impressive motivation and mid quality.
Save tens of hours in one commit?! No model is this good yet[1] - especially not Aider with its recommended models. I fully agree with parent - current SoTA models require lots of handholding in the domains I care about, and the AI chat/pairing setup works much better compared to the AI creating entire commits/PRs before a human gets to look at it.
1. If they were, Zuckerberg would have already announced another round of layoffs.
I often get hung up on UI, I can’t make a decision on what I think will look decent and so I just sort of lock up. Aider lets me focus on the logic and then have the LLM spit out the UI. Since I’ve given up on projects before due to the UI aspect (lose interest because I don’t feel like I’m making progress, or get overwhelmed by all the UI I’ll need to write) this is a huge boon to me.
I’m not incapable of writing UI, I’m just slower at it so Aider is like having a wiz junior developer who can crank out UI when I need it. I’m even fine to rewrite every line of the UI by hand before “shipping”, the LLM just helps me not get stuck on what it should look like. It lets me focus on the feature.
Also you can use local models if you want it to be “free”.
The thing is, all side projects have a cost, even if it’s just time. I’m happy to let some of my side projects move forward faster if it means paying a small amount of money.
I do like the idea of letting the model ask for source code.
It’s all about attention / context.
I’m giving myself the option to output collated code to a file, or copy it to clipboard, or just hold onto it for the next prompt.
I know aider does this stuff, but because I’m automating my own workflow, it’s worth doing it myself.
But you didn't bother checking the very next section on side bar, Supported LLM Providers, where ollama is listed.
The attention span issue today is amusing.
I find it rather depressing. I know it's a more complex thing, but it really feels irl like people have no time for anything past a few seconds before moving onto the next thing. Shows in the results of their work too often as well. Some programming requires very long attention span and if you don't have any, it's not going to be good.
So if you're going to market something to me at least do it right. My attention span is low because I don't really give a shit about this.