NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Show HN: We cut RAG latency ~2× by switching embedding model (myclone.is)
novoreorx 74 days ago [-]
Great article! I always feel that the choice of embedding model is quite important, but it's seldom mentioned. Most tutorials about RAG just tell you to use a common model like OpenAI's text embedding, making it seem as though it's okay to use anything else. But even though I'm somewhat aware of this, I lack the knowledge and methods to determine which model is best suited for my scenario. Can you give some suggestions on how to evaluate that? Besides, I'm wondering what you think about some open-source embedding models like embeddinggemma-300m or e5-large.
sippeangelo 74 days ago [-]
The biggest latency improvement I saw was switching off OpenAI's API that would have a latency anywhere between 0.3 - 6 seconds(!) for the same two word search embedding...
jawnwrap 74 days ago [-]
Cool article, but nothing groundbreaking? Obviously if you reduce your dimensionality the storage and latency decreases.. it’s less data
jimmySixDOF 72 days ago [-]
You are missing the point where accuracy stays the same
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 12:03:39 GMT+0000 (Coordinated Universal Time) with Vercel.