I asked it about "ocelot". It said it was "tlatlauelolotl" which comes from the Nahautl for "tlatl" (field), "lauel" (unknown), and "olotl" (jaguar). Except according to [0], "tlatllauelolotl" doesn't exist (and neither does "lauel" - the search results for "tlatl" and "olotl" aren't in English which means I can't judge their validity.)
There's "tlacoocelotl"[1] which seems to be the actual word for "ocelot" ("semi-jaguar").
I think that by "can guess fake words" in the title OP means that the tool can come up with a plausible guess for the etymology, even for fake words. Unfortunately, the more common reading of that phrase is that it can tell fake words from real ones
Cthulhu_ 3 hours ago [-]
Yeah I just filled in "inbreathiate", which should be a fake / made up word but this tool generates a meaning anyway... which is also neat, but the "can guess fake words" description isn't accurate.
At least it did say "slartibartfast" was a fictional character.
muzani 3 hours ago [-]
I think OP means that it can make a guess at what a nonexistent word means, something Wiktionary and urban dic doesn't do as well.
andrelaszlo 6 hours ago [-]
Agree. It does a good job at coming up with a plausible meaning of novel words like "multiarborality" → "the state of relating to many trees" but it doesn't indicate that this is a "fake" word even though I just made it up.
TeMPOraL 5 hours ago [-]
Doesn't look fake to me. English is not a closed-world language, as far as I know.
Between things like "verbing" and "nouning", and the cultural acceptance of doing them in casual speech, I'd say English is a great language because you get to "invent" new words on the fly, and your interlocutors know what you mean.
In this sense, even if no one before ever said or wrote "multiarborality", it's pretty clear what it means (as long as you don't misread it), and IMHO it's perfectly fine to derive its etymology by deconstructing it back to "common" words and pulling etymology on those, recursively.
latexr 4 hours ago [-]
> I'd say English is a great language because you get to "invent" new words on the fly, and your interlocutors know what you mean.
I bet you could do that with most languages. I don’t see why English would be especially great at it.
keiferski 2 hours ago [-]
I can think of three reasons:
1. English doesn’t really have an official regulation body, like French does.
2. The lack of cases and complex grammar. Any language with a case system is going to have more complexity when it comes to adding new words, because otherwise you end up with awkward looking constructions.
3. English itself is something of a hybrid between Latin and Germanic languages, which to my knowledge gives it a more diverse ancestry than the typical language. Ergo having a new word of dubious origins is more natural.
latexr 2 hours ago [-]
I won’t comment on the second reason, since that seems like something a linguist should address, but I don’t really buy the first and last reasons:
1. Doesn’t seem relevant, as we’re discussing making up a word in conversation and not putting it into dictionaries.
3. Especially in this globalised world, English loanwords are everywhere. No one bats an eye at it and plenty of languages distort those words to fit their own language. For example: when referring to an internet post you say you’re “posting”; another language would keep the “post” but replace the “ing” with the modifier appropriate for them.
keiferski 2 hours ago [-]
1. Well I think perhaps then we could reverse it and just see the lack of a regulatory body as a symptom of a culture that cares less about following strict linguistic rules. Compared to French, which also has a ton of slang and experimentation, but notably the power structures underlying the language care enough about maintaining a standard.
3. Loanwords are everywhere but I think they are easier to incorporate into everyday speech in English than in some other languages, especially ones with case endings. A word like taco, for example, has become indistinguishable from other “native” English words. Taco in say, Polish, requires more thinking about how it fits into the case system and what endings should be used. It’s a more complicated process than in English.
latexr 1 hours ago [-]
I still disagree but I think this would be too hard to discuss over asynchronous text. You have some arguments I’d like to explore and spend some back and forth trying to understand your point and explaining mine better, but unfortunately have stuff to still do today.
Still, I want to thank you for the polite and reasoned replies. I wish we were having this conversation in person, I’m confident it would’ve been interesting.
TeMPOraL 3 hours ago [-]
I bet you could, but in the other languages I know, I believe it would be frowned upon. The cultural acceptance of this feature is just as important as the feature itself.
latexr 2 hours ago [-]
Conversely, in the other languages I know I see it happening all the time. I don’t think frowning on it has to do with the language, but with the person. There are sticklers and SNOOTs¹ everywhere, even in English.
"Flonkers: A made-up word, likely humorous". Aren't all words made up? Edit: This one, unlike my previous example, is actually in use - flonkers: an animal that looks fat but is actually just fluffy.
nhinck3 4 hours ago [-]
turboencabulation, hydrocoptic. I bet you could look up a bunch of sci-fi for other examples of completely made up words
stavros 5 hours ago [-]
If you had used that word in a sentence, eg "what I like about this park is its multiarborality", I would have immediately understood what you meant.
sandworm101 5 hours ago [-]
>> even though I just made it up
One person inventing a word that they have never heard before doesn't negate the possibility of that word being in common use somewhere.
"An expansion of the 2020 Theodore Sturgeon Memorial Award winning story. Arboreality is a finalist for the Philip K. Dick Award and the winner of the 2023 Ursula K. Le Guin Prize for Fiction."
dizhn 4 hours ago [-]
Which it should since it knows all the words.
rvense 7 hours ago [-]
Fyllipig is derived from "fyllan" and "pigge" and means "descriptive of something that is both full/abundant and pig-like."
Terryjambled means "mixed up in a confused or disorderly state, and covered with or resembling terry cloth."
Refugglemander means to "to manipulate electoral district boundaries in a way that impacts refugees."
I'm sorry, OP. This just isn't very good.
InsideOutSanta 5 hours ago [-]
It's funny; I'm looking at your examples and coming to the opposite conclusion. I feel like it is very good because it provides explanations for unusual or novel words that are similar to what a human might conclude.
rvense 3 hours ago [-]
Etymology is a science. This is random guesswork, and it's not even very precise (deriving refuggle from refugee is definitely objectively wrong).
Maybe I'm coloured by having spent half a decade of my life on a linguistics degree, though.
InsideOutSanta 37 minutes ago [-]
Would you expect that it would point out that "refuggle" is not a word with any documented usage rather than drawing a strenuous connection to a slightly similar word?
I find the connections it draws amusing. Since I'm not the inventor of the word "refuggle," I can't say that I know its etymology or how it relates to "refugee." But I guess this is one weakness of LLMs: they're bad at admitting they don't have an answer.
Cthulhu_ 3 hours ago [-]
Cool, on my end it said "refugglemander" means "One who repeatedly flees and wanders" (re- fugere -ler mandros)
stavros 5 hours ago [-]
What etymologies would you have expected instead?
3 hours ago [-]
muzani 3 hours ago [-]
I wish you could pick a language. Bahasa has a ton of dialects + txt, which make it difficult to guess sometimes.
Like "gua" in the dictionary means cave but in some dialects it's an informal "I". Sometimes it gets shortened to the phonetic "gue", sometimes "gu" which is similar to "aku" shortened to "ku", which is another form of "I". Really we have like 7 different ways to say informal "I". I think the tool guesses it as Chinese.
Malaysian tends to remove affixes when shortened, while Indonesian removes vowels but keeps the affixes, so something like "come here quickly" might be shortened to "come quick" or "here2 qkly". Less etymology, more about encoding.
thih9 7 hours ago [-]
When I asked it to deconstruct "Babbage"[1] I got "Derived from Babba's place", Some others:
- phonenose: The ability to detect sounds or voices through the nose
- legpc: Acronym for Laptop Easy Personal Computer
- gitls: A command in Git to list files
- housefreezing: The action of hardening a house with cold
- uncleftish beholding[2]: The act of viewing something that is whole
In any case it's fun to play with and the UI is nice too.
Note, the title looks editorialized, it's currently "A AI etymology deconstructor – can guess fake words", but the website says just "deconstructor.".
It doesn't really detect stringing together latinate nonsense, so all of my coinages got perfectly coherent definitions. On the flip side, I guess if you're erudite and willing to butcher other languages for sport, nothing has ever stopped you from making up perfectly quasiaryiumaryesque words.
It actually knows about "Lisa the Iconoclast". Cool.
Oarch 4 hours ago [-]
It's a perfectly cromulent tool
Fission 7 hours ago [-]
This is fun, but what I think would be even more interesting and potentially actually useful would be to generate a word from multiple ideas — to help express complex ideas in one word. Like Frühstücksbrotübersättigung
fcatalan 3 hours ago [-]
You can just prompt for it:
"KI-Philologenstaunverunsicherung"
Breaking it down:
KI – (Künstliche Intelligenz, Artificial Intelligence)
Philologen – (Philologists, scholars of language and linguistics)
Staun – (from staunen, to marvel or be astonished)
Verunsicherung – (a sense of uncertainty or unease)
> You've deconstructed 5 words! Enter your email to continue your language journey and get notified about new features.
Yeah, no. I tried the same word five times to test for inconsistencies. I don’t appreciate you weren’t straightforward with the limits and am not going to give you my email to “continue my language journey”. The fact you’re already using euphemisms like that makes me strongly distrust what you’d do with an email address.
jonplackett 6 hours ago [-]
I love this kind of thing. It’s weird how many words we use without having any idea what they mean.
Words always end up getting broken with speech in a different way to the meaning.
Eg I put in pandemonium which you say as pande-moniun but I just learned is pan-demon-ium which makes a lot more sense!
Very cool. Well done.
Timwi 5 hours ago [-]
Try “helicopter” for a real mind-blow.
stavros 5 hours ago [-]
I hate how everyone always breaks it up as "heli" and "copter" when it's "helico" and "pter".
Oarch 4 hours ago [-]
I'd often casually wondered where the word scaffolding comes from. Its explanation seems sort of plausible.
dkdbejwi383 3 hours ago [-]
what is a "fake word"? New terminology is coined all the time. It's one of the fundamentals of how language works.
moebrowne 3 hours ago [-]
I think they mean never before seen words aka made up words like "antecarburetorism"
It did not like supercalifragilisticexpialidocious
InsideOutSanta 5 hours ago [-]
I had the same idea, and it concluded that it meant "fantastic," which seems correct.
youdont 6 hours ago [-]
It really struggled with Antidisestablishmentarianism
fliglr 5 hours ago [-]
I'll no longer misunderestimate AI
renewiltord 3 hours ago [-]
Something I find interesting on the internet is the number of times an LLM can decode a human’s intent better than other humans. Claude 3.7 Sonnet also makes the common misinterpretation here that “can guess fake words” means “can detect words that don’t have an established meaning” rather than “can guess an etymology for words that don’t have an established meaning” but I gave it the chance to try a word on the tool and interpret the title and it successfully did.
This is interesting because it may eventually become the case that we should interpret other human texts through an LLM because it is better at understanding than many of us are.
My prompt, in case you are curious:
> I came across a HN story titled “AI etymology deconstructor - can guess fake words”. I want you to interpret the title and describe what this does. Then I’ll give you the chance to provide a test word and you will reinterpret the title.
4 days ago [-]
novemp 7 hours ago [-]
"Can guess fake words"? It told me "shitbutt" was a real word.
7 hours ago [-]
Timwi 5 hours ago [-]
Very cool to use initially, but then completely locks you out with a modal in your face demanding an email address. That is despicable and you should be ashamed.
Rendered at 14:26:40 GMT+0000 (Coordinated Universal Time) with Vercel.
There's "tlacoocelotl"[1] which seems to be the actual word for "ocelot" ("semi-jaguar").
0/10 for the AI there.
[0] https://nahuatl.wired-humanities.org/
[1] https://nahuatl.wired-humanities.org/content/tlacoocelotl
At least it did say "slartibartfast" was a fictional character.
Between things like "verbing" and "nouning", and the cultural acceptance of doing them in casual speech, I'd say English is a great language because you get to "invent" new words on the fly, and your interlocutors know what you mean.
In this sense, even if no one before ever said or wrote "multiarborality", it's pretty clear what it means (as long as you don't misread it), and IMHO it's perfectly fine to derive its etymology by deconstructing it back to "common" words and pulling etymology on those, recursively.
I bet you could do that with most languages. I don’t see why English would be especially great at it.
1. English doesn’t really have an official regulation body, like French does.
2. The lack of cases and complex grammar. Any language with a case system is going to have more complexity when it comes to adding new words, because otherwise you end up with awkward looking constructions.
3. English itself is something of a hybrid between Latin and Germanic languages, which to my knowledge gives it a more diverse ancestry than the typical language. Ergo having a new word of dubious origins is more natural.
1. Doesn’t seem relevant, as we’re discussing making up a word in conversation and not putting it into dictionaries.
3. Especially in this globalised world, English loanwords are everywhere. No one bats an eye at it and plenty of languages distort those words to fit their own language. For example: when referring to an internet post you say you’re “posting”; another language would keep the “post” but replace the “ing” with the modifier appropriate for them.
3. Loanwords are everywhere but I think they are easier to incorporate into everyday speech in English than in some other languages, especially ones with case endings. A word like taco, for example, has become indistinguishable from other “native” English words. Taco in say, Polish, requires more thinking about how it fits into the case system and what endings should be used. It’s a more complicated process than in English.
Still, I want to thank you for the polite and reasoned replies. I wish we were having this conversation in person, I’m confident it would’ve been interesting.
¹ http://www.jamesgerity.com/biblio/authority.pdf
I don't even know what a fake word is.
https://deconstructor.ayush.digital/w/flonkers
"Flonkers: A made-up word, likely humorous". Aren't all words made up? Edit: This one, unlike my previous example, is actually in use - flonkers: an animal that looks fat but is actually just fluffy.
One person inventing a word that they have never heard before doesn't negate the possibility of that word being in common use somewhere.
https://books.google.ca/books/about/Arboreality.html?id=S95Y...
"An expansion of the 2020 Theodore Sturgeon Memorial Award winning story. Arboreality is a finalist for the Philip K. Dick Award and the winner of the 2023 Ursula K. Le Guin Prize for Fiction."
Terryjambled means "mixed up in a confused or disorderly state, and covered with or resembling terry cloth."
Refugglemander means to "to manipulate electoral district boundaries in a way that impacts refugees."
I'm sorry, OP. This just isn't very good.
Maybe I'm coloured by having spent half a decade of my life on a linguistics degree, though.
I find the connections it draws amusing. Since I'm not the inventor of the word "refuggle," I can't say that I know its etymology or how it relates to "refugee." But I guess this is one weakness of LLMs: they're bad at admitting they don't have an answer.
Like "gua" in the dictionary means cave but in some dialects it's an informal "I". Sometimes it gets shortened to the phonetic "gue", sometimes "gu" which is similar to "aku" shortened to "ku", which is another form of "I". Really we have like 7 different ways to say informal "I". I think the tool guesses it as Chinese.
Malaysian tends to remove affixes when shortened, while Indonesian removes vowels but keeps the affixes, so something like "come here quickly" might be shortened to "come quick" or "here2 qkly". Less etymology, more about encoding.
- phonenose: The ability to detect sounds or voices through the nose
- legpc: Acronym for Laptop Easy Personal Computer
- gitls: A command in Git to list files
- housefreezing: The action of hardening a house with cold
- uncleftish beholding[2]: The act of viewing something that is whole
In any case it's fun to play with and the UI is nice too.
Note, the title looks editorialized, it's currently "A AI etymology deconstructor – can guess fake words", but the website says just "deconstructor.".
[1]: https://en.wikipedia.org/wiki/Charles_Babbage
[2]: https://en.wikipedia.org/wiki/Uncleftish_Beholding
https://deconstructor.ayush.digital/w/Deundehydrate
"to reverse the process of reversing the removal of water"
There is clearly no such word, but a human would probably infer such a meaning.
I'm amazed at how unamazed we all seem to have become at such feats. We need more deunnamazing
In other words, a false etymology constructor, not an etymology deconstructor :)
"KI-Philologenstaunverunsicherung"
Breaking it down:
KI – (Künstliche Intelligenz, Artificial Intelligence) Philologen – (Philologists, scholars of language and linguistics) Staun – (from staunen, to marvel or be astonished) Verunsicherung – (a sense of uncertainty or unease)
https://chatgpt.com/share/67d80661-4bb4-8012-9328-77d56af52b...
Yeah, no. I tried the same word five times to test for inconsistencies. I don’t appreciate you weren’t straightforward with the limits and am not going to give you my email to “continue my language journey”. The fact you’re already using euphemisms like that makes me strongly distrust what you’d do with an email address.
Words always end up getting broken with speech in a different way to the meaning.
Eg I put in pandemonium which you say as pande-moniun but I just learned is pan-demon-ium which makes a lot more sense!
Very cool. Well done.
https://deconstructor.ayush.digital/w/antecarburetorism
This is interesting because it may eventually become the case that we should interpret other human texts through an LLM because it is better at understanding than many of us are.
My prompt, in case you are curious:
> I came across a HN story titled “AI etymology deconstructor - can guess fake words”. I want you to interpret the title and describe what this does. Then I’ll give you the chance to provide a test word and you will reinterpret the title.