NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Show HN: Sisi – Semantic Image Search CLI tool, locally without third party APIs (github.com)
notsylver 121 days ago [-]
I was planning to do this myself lol. I was going to use SQLite as the index, and use `sqlite-vec` or something similar to query for similar files directly. I think the only other thing I was planning were more filters, `"positive term" -"negative term"` to be able to negate results, `>90"search"` to find images that match by >90% and some generic filters like `--size >1mb` to help narrow it down when you are looking for a specific image. Quantizing embeddings to make them smaller/faster also seemed interesting but I haven't tried doing it yet.
progx 121 days ago [-]
Uses only 1 core 100% under linux, can this be changed?

10 images, each ~20 kb size, took more than 10 minutes to index, is that normal without GPU-acceleration?

zcbenz 121 days ago [-]
No it is not normal, I only tested x64/arm64 macs, I will try on linux.
a_wild_dandan 121 days ago [-]
What’s normal? On your Apple silicon.
sureIy 121 days ago [-]
Wow that’s atrocious performance. So there’s no chance to use this on real photos
spullara 121 days ago [-]
Very cool! Here is a similar python version.

https://github.com/spullara/photoindex

Oh and if you want to run something locally on your iphone you can use my app I am still testing:

https://x.com/getrememberwhen

sureIy 121 days ago [-]
This is cool. Is there also a way to show contents of the image as indexed? i.e. image 1 has cat and dog

There are a lot of tool/apps that let you “search images” but not much that lets you just as easily “read images”

kjeldsendk 120 days ago [-]
I have wanted to clean up my photo collection for ages and remove any nsfw picture that might hide somewhere.

Would this be able to do that and how likely is it It will see a pc release.

Eisenstein 120 days ago [-]
This script doesn't do search, but it generates keywords for images and places them in the image metadata. You can then search for keywords using something like Diffractor. I will warn though that any AI solution not geared towards NSFW will not give good information on NSFW images, though it may give a keyword such as 'intimate' or 'adult content' which is all you need.

* https://github.com/jabberjabberjabber/LLavaImageTagger/

petesergeant 121 days ago [-]
I've been enjoying https://github.com/mazzzystar/Queryable on iPhone
y04nn 120 days ago [-]
How does CLIP compare to YOLO[1]? I haven't looked into image classification/object recognition for a while, but I remember that YOLO was quite good was working on realtime video too.

[1]: https://pjreddie.com/darknet/yolo/

Eisenstein 120 days ago [-]
CLIP and YOLO work completely differently and have different purposes. CLIP uses transformers and embeddings and can compare text with images for classification. YOLO using a CNN and is trained with bounding boxes on images and is used for image recognition.

Give an image to CLIP and you can compare the similarity between the image and a sentence like 'a vase with roses in it'. Whereas with YOLO you give it an image and get the coordinates of bounding boxes around a vase, and around roses.

yburkov 120 days ago [-]
netdur 121 days ago [-]
I have made similar android app for semantic image search, works offline too, still gathering feedback and polishing UI, but it works, if you are brave enough here is it https://drive.google.com/file/d/1tE0cY6umj5h5zCY_Jvaou1M8sCf...
nickphx 120 days ago [-]
Why yes, I'll download a 695MB APK file from an internet stranger.
netdur 120 days ago [-]
Yes, the size is 99% 2 models weights required to run inference offline, there no way around it.
KetoManx64 121 days ago [-]
Is there a github link?
netdur 121 days ago [-]
We have not decided what to do with it yet. It could be free, paid, or open source. However, the logic code for using semantic search with CLIP-compatible models on Android will be available on GitHub.
ivanjermakov 121 days ago [-]
In russian, "sisi" is a variation of "tits".

Is there a job/services that confirm that branding is appropriate across different languages? Seems like a non trivial problem to solve.

visarga 121 days ago [-]
> Seems like a non trivial problem to solve.

Took me 5 minutes to land this GPT prompt.

https://chatgpt.com/share/66e84c0c-a92c-800a-b452-255d6fe942...

Results:

- Chinese (Simplified) 四四 (sì sì) – sounds like "four-four", which can be associated with bad luck due to the number four in Chinese culture

- Arabic "Sisi" is a common nickname, also associated with Egypt's President Abdel Fattah el-Sisi

- Russian Сиси (sisi) – slang for breasts

- Bulgarian Сиси (sisi) – slang for breasts

- Serbian Сиси (sisi) – slang for breasts

- Croatian Sisi – slang for breasts

You should probably complement with a web search and a wiktionary search because they have all languages on a single page.

pdimitar 121 days ago [-]
Does ChatGPT get anything right, ever?

In Bulgarian the slang is Цици (tsi tsi). I imagine it's near-identical for many other Slavic languages.

visarga 121 days ago [-]
Yeah I noticed it was pretty shaky, change the prompt a bit and the result changes a lot. Not very reliable after all by itself, but used in conjunction with other methods.
sureIy 121 days ago [-]
It’s not that straightforward due to spelling. Does that catch køk? Tihts? P. Nus? For a non English swear word, I had to ask 3 times and about a specific language to finally make that connection.
Lockal 120 days ago [-]
Tried with "hui" - for ChatGPT this word "has no specific meaning and is safe for use in any language".
ivanjermakov 121 days ago [-]
Cool LLM application! Might not be enough though.
fedeb95 121 days ago [-]
that's a nice start, maybe does 99% of the job, but to be 100% sure, you still need additional (manual?) checks.
zcbenz 121 days ago [-]
That is sad, the name sisi comes from the sisi empress: https://en.m.wikipedia.org/wiki/Empress_Elisabeth_of_Austria
kgeist 121 days ago [-]
kristopolous 121 days ago [-]
I read about a company in the 1990s that did that. They went one step further - picking culturally appropriate colors, shapes, numbers, and then permuting the brand names to favorable variations for a country. My (probably wrong) 25 year old recollection was when they introduced subway in China they basically found a way to pronounce it that translated to "this place is delicious". I bet it was in Wired. If not that, probably New York Magazine.
bjord 121 days ago [-]
it's also the name of egypt's authoritarian leader

https://en.wikipedia.org/wiki/Abdel_Fattah_el-Sisi

jollyllama 121 days ago [-]
Yes this is the first thing that came to mind for me, strange name choice
fkyoureadthedoc 121 days ago [-]
Even if that was the intent, which it almost certainly isn't, why would it be strange enough to warrant discussion?
jollyllama 120 days ago [-]
As an American who monitors world affairs, the choice of a quasi-authoritarian junta leader as a name would be quite novel.
phito 121 days ago [-]
It's definitely not a good name in English either
Zambyte 121 days ago [-]
I assume you're reading it as "sissy", but I read it as "seesee", which is fine in English.
Narhem 121 days ago [-]
I read it as sisi, but which means “thank you” in viet.
rlpb 121 days ago [-]
Sounds like something one might try to train an AI to do :)
philsnow 121 days ago [-]
In cantonese it’s what a toddler might call poop
Narhem 121 days ago [-]
A lot of the prodemently used programming languages and libraries have references to feces if you speak Farsi.
121 days ago [-]
Jack5500 120 days ago [-]
Isn‘t clip superseeded by multimodal llms?
Eisenstein 120 days ago [-]
In this program CLIP is being used to create embeddings. A multimodal LLM does something very similar. In this case the language model is not needed because the embeddings are being used to search directly.
24currynigger 120 days ago [-]
[flagged]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 08:18:50 GMT+0000 (Coordinated Universal Time) with Vercel.