NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Stop Using Ollama (sleepingrobots.com)
utopiah 2 minutes ago [-]
Not sure why VLC doesn't do that.

It's a joke... but also not really? I mean VLC is "just" an interface to play videos. Videos are content files one "interact" with, mostly play/pause and few other functions like seeking. Because there are different video formats VLC relies on codecs to decode the videos, so basically delegating the "hard" part to codecs.

Now... what's the difference here? A model is a codec, the interactions are sending text/image/etc to it, output is text/image/etc out. It's not even radically bigger in size as videos can be huge, like models.

I'm confused as why this isn't a solved problem, especially (and yes I'm being a big sarcastic here, can't help myself) in a time where "AI" supposedly made all smart wise developers who rely on it 10x or even 1000x more productive.

Weird.

osmsucks 2 minutes ago [-]
I noticed the performance issues too. I started using Jan recently and tried running the same model via llama.cpp vs local ollama, and the llama.cpp one was noticeably faster.
TomGarden 6 minutes ago [-]
The performance issues are crazy. Thanks for sharing this
usernomdeguerre 2 hours ago [-]
Do they still not let you change the default model folder? You had to go through this whole song and dance to manually register a model via a pointless dockerfile wannabe that then seemed to copy the original model into their hash storage (again, unable to change where that storage lived).

At the time I dropped it for LMStudio, which to be fair was not fully open source either, but at least exposed the model folder and integrated with HF rather than a proprietary model garden for no good reason.

andreidbr 28 minutes ago [-]
This also annoyed me a lot. I was running it before upgrading the SSD storage and I wanted to compare with LM Studio. Figured it would be good to have both interfaces use the same models downloaded from HF.

Had to go down the same rabbit hole of finding where things are, how they're sorted/separated/etc. It was unnecessarily painful

0xbadcafebee 16 minutes ago [-]
No mention of the fact that Ollama is about 1000x easier to use. Llama.cpp is a great project, but it's also one of the least user friendly pieces of software I've used. I don't think anyone in the project cares about normal users.

I started with Ollama, and it was great. But I moved to llama.cpp to have more up-to-date fixes. I still use Ollama to pull and list my models because it's so easy. I then built my own set of scripts to populate a separate cache directory of hardlinks so llama-swap can load the gguf's into llama.cpp.

tyfon 28 minutes ago [-]
I think the biggest advantage for me with ollama is the ability to "hotswap" models with different utility instead of restarting the server with different models combined with the simple "ollama pull model". In other words, it has been quite convenient.

Due to this post I had to search a bit and it seems that llama.cpp recently got router support[1], so I need to have a look at this.

My main use for this is a discord bot where I have different models for different features like replying to messages with images/video or pure text, and non reply generation of sentiment and image descriptions. These all perform best with different models and it has been very convenient for the server to just swap in and out models on request.

[1] https://huggingface.co/blog/ggml-org/model-management-in-lla...

majorchord 14 minutes ago [-]
> the ability to "hotswap" models with different utility instead of restarting the server

The article mentions llama-swap does this

segmondy 27 minutes ago [-]
You can do that with llama-server
speedgoose 22 minutes ago [-]
I prefer Ollama over the suggested alternatives.

I will switch once we have good user experience on simple features.

A new model is released on HF or the Ollama registry? One `ollama pull` and it's available. It's underwhelming? `ollama rm`.

kennywinker 10 minutes ago [-]
> This creates a recurring pattern on r/LocalLLaMA: new model launches, people try it through Ollama, it’s broken or slow or has botched chat templates, and the model gets blamed instead of the runtime.

Seems like maybe, at least some of the time, you’re being underwhelmed my ollama not the model.

The better performance point alone seems worth switching away

fy20 9 minutes ago [-]
It feels like a bit of history is missing... If ollama was founded 3 years before llama.cpp was released, what engine did they use then? When did they transition?
Zetaphor 3 hours ago [-]
I got tired of repeating the same points and having to dig up sources every time, so here's the timeline (as I know it) in one place with sources.
brabel 3 minutes ago [-]
Thanks for writing this, I hope people here will actually read this and not assume this is some unfounded hit piece. I was involved a little bit in llama.cpp and knew most of what you wrote and it’s just disgusting how ollama founders behaved! For people looking for alternatives, I would also recommend llama-file, it’s a one file executable for any OS that includes your chosen model: https://github.com/mozilla-ai/llamafile?tab=readme-ov-file

It’s truly open source, backed by Mozilla, openly uses llama.cpp and was created by wizard Justine Tunney of CosmopolitanC fame.

kelsolaar 26 minutes ago [-]
Great writing, thanks for the summary and timeline.
robot-wrangler 27 minutes ago [-]
Thanks, did not know any of this.
dnnddidiej 28 minutes ago [-]
On a practical note if fumbles connection handling as to be unusable to download anything.
yokoprime 25 minutes ago [-]
i had no idea about all this. especially the performance and bugs. thanks for informing me!
dackdel 27 minutes ago [-]
i use goose by block
sudb 4 minutes ago [-]
seems pretty unrelated to the post?

also you might be the only person in the wild I've seen admit to this

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 06:46:05 GMT+0000 (Coordinated Universal Time) with Vercel.