We don't want these machines to get good at memorising. Here's why.

Stop hoping your LLM will get more factually accurate!

The problem of hallucinating AIs has been food for viral news stories for the last year. Those of us observing the evolution of Large Language Models (LLMs) may even be feeling a little nostalgic for the days when we could laugh at images of ‘optimised’ hands with 6 or more fingers – the AI logic being, if 5 fingers are good, 6 fingers must be even better! It was perfect for illustrating why we might not want to fully trust their logic. LLMs are getting more accurate, but that may actually be a cause for concern.

ChatGPT is not the same as Bing

One problem with LLMs is their ease of use. There is huge similarity between the user interface for search tools and LLMs which generate text and images:

The search engine will provide you with (largely) human-created content, written by a person who actually researched the topic, or a photograph that someone took of a real scene. If it appears on the first few pages of a search return, then it means that enough people thought it was good enough to endorse it.

An LLM is a completely different technology. When an LLM needs to produce, say, a meaningful sentence, it will scan through all the 1.5 billion parameters in its database and look for ‘tokens’, in this case individual words. If Token 1 is the word ‘I’ then Token 618, ‘think’, might be the best guess for what comes next. These machines have to cycle through billions of pages, millions or billions of times (pity the environment) and then eventually will be able to produce a fairly good prediction of a sensible text.

The problem of ‘overfitting’, which in this context means inappropriately repeating words, whole sentences or texts in the wrong moment, should be overcome by the sheer magnitude of the training process, so the logic goes. In other words, because the data set is so large, it won’t spit out what was put in word for word, putting people at risk. Is this the case?

New legal definitions needed

There is no question that some of the big LLMs have been built by teams with very limited understanding of the training data behind them. Scraping copyrighted webpages for training data without authorisation or oversight has landed the teams from OpenAI, Stability AI, Meta and Google in the law courts. We need new legal terms to even talk about the nature of the potential infringements, but a few existing technical definitions can help.

If, for example, a machine is vulnerable to a ‘training data extraction attack’, it means that the machine has reconstructed training data points exactly. That might be in the form of a Git Commit Hash, a whole documentor or an untouched photo. That is the clearest copyright breach, and research teams have been able to get LLMs to produce, for example, a word for word page from Harry Potter that the machine had memorised verbatim.

It may seem logical to assume that memorising, say, a whole licence agreement, is more troubling than memorising phrases or strings of words, but shorter strings can be problematic. What if that shorter string is your name and phone number or address or the names of your children?

Hallucinations currently offer some protection

A ‘targeted adversarial attack’, where the user is attempting to get specific information from an LLM, is not a theoretical problem as has been argued. It is more of a private data leak than a copyright concern. In theory, the data sets are so large and overfitting (the risk of a machine producing a sentence verbatim) is unlikely, and so targeted adversarial attacks are not possible … until they were. Academics have proven that such attacks are not only practically achieved but they become more likely – up to 18 times more likely – as these models get larger and larger¹.

Google’s Gemini may have been slammed on social media for thinking that the pope and the Founding Fathers of the USA could have been ethnically and gender diverse. But that same property of ‘well it might be’ offers some protection against attacks on your personal information. If the machine is confused about the power dynamics of 18th century USA and the Catholic Church, it will also be confused about which school your children attend. Making these machines more ‘factual’ may be deeply undesirable.

What does the law say?

The LLM leaks currently going through the law courts are technically called ‘model inversion attacks’. These are attacks where someone is able to request the machine to reconstruct training data points, not exactly verbatim but to the point where a ‘fuzzy’ representation is close enough to the original to be able to make identification possible. An example of this would be the video images of Jennifer Aniston dancing, which are currently doing the rounds on social media. We know it’s not really her, but we can all immediately recognise and agree that it’s her likeness.

When these model inversion attacks are used in the wrong context, we get other kinds of abuses of individual rights. For example, for large Fortune 500 companies, the name of a product that your company is currently developing could be leaked publicly. It may leak R&D data on that too. Your contact details on a document online, like an academic journal, feels safe because it receives so little traffic from such a targeted audience, but that may be leaked in a completely different context. Secrets are an essential part of how we work.

The solution: Get the training data right

These machines are fundamentally optimised to memorise. Memorising is the approach that produces the best result with the smallest loss function, and so we should be taking their threat seriously. The solution is a more responsible development process in the first place.

It is not ‘impossible’ to train an LLM without copyrighted and protected materials, as some developers have argued under questioning² . Smaller companies have taken a more ethical approach to training data and include policies on compensating humans for any creative input used in the training process. For example, Adobe Photoshop’s Firefly is a little slower and a little more limited than other LLMs, but the company can explain exactly what data went into the model.

Tracing input data is also technically possible through hashing tools if you have the fingerprint of digital content. At the moment, this has been deployed on the worst, most exploitative images, but it is possible.

Your LLM may feel very familiar to a search engine tool, their future application is still very much open to debate.

¹ See the report on Open AI’s arguments in court arXiv

² See the study on The Guardian