The ongoing AI chat heist that doesn't require a hack

June 20, 2024

The ongoing AI chat heist that doesn't require a hack

Category:

Vulnerability Explainers

On the night of August 1st 1990, Frank Meeks learned of the upcoming Iraqi invasion of Kuwait before most Americans knew where Kuwait was on a map.

Meeks wasn’t privy to confidential information, nor was he a spy of some sort. His trick was to simply own 45 Domino’s Pizza locations. The number of deliveries to the White House and to the Pentagon skyrocketed that night, up more than ten-fold from a typical Wednesday evening, so Meeks knew something was up before anyone else.

Five months later, and on the evening of January 16th 1991, Meeks had more intel. The deadline for the Iraqis to exit Kuwait had come and gone, and the Pentagon was ordering pizzas. Hours later, the coalition launched Operation Desert Storm - an aerial bombardment campaign that largely eliminating Iraqi military assets located within Kuwait in less than 48 hours.

Meeks had discovered a side-channel vulnerability in the Pentagon. He used information that was incidentally leaked by a secret activity to uncover parts of the secret itself. This particular vulnerability was later coined the Pizza Meter, or the Washington Pizza index.

These days, the Pentagon orders for pickup.

From words to tokens

AI systems like Trelent are powered by large language models (LLMs) which are designed to predict the next word in a sentence. The issue is that there are a lot of words! English alone has an estimated 170,000 words in active use, and over a million historically. Before we even start on other languages, the model would already have an overwhelming amount of choices that would dramatically hinder its performance.

Fortunately for us, the researchers behind these LLMs thought about that. They developed software called “tokenizers” that would take in massive amounts of text spanning many languages. These include both natural languages like English, Hindi or Mandarin, but also programming languages like Python or C++. The goal here is to encompass every useful type of text that people might want to write. They use this giant dataset to answer this question: what is the minimum set of character strings with which we can compose every possible sequence in our dataset?

Let’s unpack that, because there’s a lot of jargon!

A set just means a unique list of some kind of items. In this case, you can conceptualize it as a list of 4-6 letter words (the tokens). Character strings really just mean chunks of words that appear frequently in the text. A common combination like “Th” for example would likely become a token, as would “stdin” or “😄”. So, the goal of the tokenizer while its being trained is to identify the smallest possible list of these 4-6 letter sequences, that allows for the entire dataset to be reconstructed without errors.

For all intents and purposes, these tokens can be called words, but just know that a token may not actually look like a word you know 100% of the time.

Tokenizing the Pizza Index

These tokens have an interesting property - not all of them have the same length. In fact, the outliers can be dozens of letters long, while the average is only around 4! The security practitioners among you may already be picking up on a potential flaw innate to this design.

You see, when you use LLMs, they tend to “stream” back tokens as they are written. This has the satisfying typewriter effect, where you can see the start of the message before the end of it is ready - kind of like the message is being typed in front of you. What’s really happening here is that every time the LLM predicts a token, it sends it to you immediately. We call this a “token stream”, as if the tokens are flowing towards you as a stream would.

Now, thankfully for everyone involved, these token streams are encrypted in transit, which means that from the outside you can’t tell what’s inside the stream - there’s just scrambled data flying through at a remarkable rate. But if you slow things down, there’s a pattern.

Let’s zoom out for a minute and go look at OpenAI’s documentation to figure out why a pattern emerges:

{..., "choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
{..., "choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]}
...
{..., "choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}

‍As we can see, a lot of this data doesn’t change much between tokens! In fact, only the “delta” parameter changes, and only ever with the content of the token being transmitted. This is interesting, because the data is encrypted in such a way that the length of the encrypted data is correlated to the length of the plaintext data.

Now, in and of itself that’s typically okay! It doesn’t reveal too much on its own, especially when dealing with small amounts of information at a time as we are here. But because we know that token lengths vary, and that in the extreme cases they can be significantly longer than others, we can use the difference in size of the encrypted chunks to measure how long the token is likely to be. The keen among you are probably piecing together an awfully long list of pizza orders to the Pentagon…

Token size matters

Because tokens have varying lengths, at the extreme end there may only be a single token that corresponds to that length. This gives us something to capitalize on, because we can scan for particularly long packets that are within our tokenizer’s range, and have a high degree of certainty that that long token is present in our input stream.

Here’s what we know so far:

<unknown 9> <unknown 6> <uknown 5> <unknown 1> <unknown 0>

We have some information to help us though! For example, we know that the first token is always “assistant”, and ends with a fixed length stop sequence. Now we know that there’s three tokens in the sentence, and the first word is 6 letters long, then 5, then 1. Now it’s a simple matter of guessing the most likely 6-letter long token:

Assistant:
           Hello
           Evenin
           Hey th
           ...

Because we know the next token is 5 characters long, the only one that really fits is “Hello “. Now we try to guess the most likely 5-letter token:

world - 48%
ere!  - 21%
g! My - 5%...

‍The most likely option here is now world, thanks to “Hello world” being a common phrase used among programmers. You can also see that it accounted for “evening” being cut off by using a token that combined the end of evening and “My” in a new sentence. The same thing goes for “Hey th” and “ere! “.

Not only do we now know that the AI response was more than likely “Hello world!”, but we also have a range of possibilities with specific probabilities assigned to each! The last two are actually quite similar, so its easy to say that the conversation is either programming related (with the “hello world!” response) or a greeting (”Evening! My” or “Hey there! “). This can reveal a lot about the conversation as more tokens come in.

Ultimately, practitioners were able to train their own AI model to do this up to an accuracy of 60%. That is, they could recover ~60% of an encrypted AI chat conversation accurately. Not only that, but they could predict the topic over 75% of the time!

The redemption arc

This type of side-channel vulnerability that not only reveals information, but reveals specific information, is what we call a “practical side-channel”. A practical side-channel attack tends to be a rarity, but because of how LLMs work, this one is nasty. Thankfully, it is also easy to mitigate. All we have to do is add random data to every streamed packet, alongside the token information. For example, take our previous conversation:

{..., {"role":"assistant","content":""}}]}
{..., {"content":"Hello"}}]}
{..., {"content":" world"}}]}
{..., {"content":"!"}}]}
{..., {},"logprobs":null,"finish_reason":"stop"}]}

We would add a “patch” to turn it into something like this:

{..., {"role":"assistant","content":""}}], "patch": "abcajefajdlsfjaeifweh124"}
{..., {"content":"Hello"}}], "patch": "124212jladf"}
{..., {"content":" world"}}], "patch": "124abcajefajdlsfjaeifweh124212jladf"}
{..., {"content":"!"}}], "patch": "ncnvmx,euoadfhlz9135n,mfd"}
{..., {},"logprobs":null,"finish_reason":"stop"}], "patch": "uc1247nvmx,adbgse?;eGgl"}

‍

Now, because we have added a bunch of random information of varying lengths to our streaming data, it is impossible to deduce what token is being used accurately. We have simply added too much entropy (randomness) to the data to be useful as a side-channel.

This patch is live in Trelent today. We focus on identifying and remedying these sorts of vulnerabilities so that your organization can confidently use AI throughout day-to-day work, without the unnecessary risk of anybody else seeing your data. You can try Trelent for free today by signing up here, or by scheduling a demo.