Contents

The future of Large Language Models

Seven predictions for future LLMs

Large language models (LLMs) may already appear like coming from the future - but the fact is that in many ways they are still very rough and achieve some of their amazing performance via brute force: more computing power, more training data, bigger models. LLMs of the future will look quite different, and below I will make seven predictions about future developments. All seven aspects are already heavily researched and are partially already surfacing in recent LLMs, and may arrive sooner than you think - the majority perhaps already in 2024.

1. Smaller LLMs

Yes, that’s right. The recent trend has been towards bigger LLMs: GPT-4 allegedly has about one trillion parameters while in 2018, a model with 100 million parameters was considered huge! However, this trend cannot continue: training and using such models is very expensive, to the degree that OpenAI is actually losing money by offering their services (in return for securing reputation and market share). Even bigger models are simply not economically sensible. And even if anyone were willing to pay the price: Considering that training GPT-4 required about 7.5 MWh of energy, the monthly consumption of 10000 US households, there simply would not be enough energy available to scale that up by another factor 1000.

Recent models like Llama2-7B or Mistral-7B which are more than 100 times smaller than GPT-4 show remarkable performance, for some applications even beating GPT-4 and rarely trailing far behind. This trend will continue, in particular since LLMs increasingly use external knowledge sources like web search, calculators, or databases: it is not necessary to save all the knowledge inside of an LLM anymore, but only the skill to be able to retrieve it when needed. This will make LLMs smaller, cheaper, and faster without sacrificing performance.

2. Specialized LLMs

The “G” in GPT is for “General”, and indeed this is the main difference to its predecessors: it is not designed to do well on just one specific task. While this is scientifically fascinating and by many seen as a step towards a general AI, it rarely makes sense commercially: companies who integrate LLMs into their product typically require it to perform well on only a few tasks, e.g., answering questions on its product, or translating languages. This can often be achieved by extracting these skills from an LLM to a model that may be 1000 times smaller - and hence, (almost) 1000 times faster and cheaper. And such a smaller model can then even run on your mobile phone instead of requiring a huge server farm. So specialization will also make LLMs smaller: by decreasing their skills on tasks they are not used for. This is a trend we already see realized by some small LLMs outperforming GPT-4 on specific tasks while performing very poorly on others.

3. Omni-modal LLMs

Recently, LLMs have become multi-modal, meaning that they accept different modalities as input: GPT-4 can handle both text and images. Future LLMs will accept speech and videos as input as well - in fact, some already do. Maybe some day LLMs will also smell and touch, who knows… but more importantly: Future LLMs will be able to integrate all these inputs seamlessly, just like humans do: to explore an object in the real world, we use all our senses and combine the information. Or could you imagine understanding what a “car” is just by hearing its description, or judge the intentions of a person just by seeing them?

4. The decline of black box LLMs

Currently, the battle is on between “black box” LLMs like ChatGPT and Bard and more open models like Llama2 or Falcon. No one knows what data ChatGPT was trained with, nor how it works internally. To use it, you send your request to an OpenAI or Microsoft black box and get an answer back, without knowing anything about how the answer came about. Models like Llama2 can be downloaded by any user, inspected and improved in any way and then again shared with the community - and for many powerful LLMs, even used for commercial applications. Already now, hundreds of variants of Llama2 are available and the speed of innovation is incredible. While it costs millions of dollars and many months to train a model from scratch, producing a new model based on existing open models only takes a few hours and a few dollars. And this is something Google employees are very much aware of. I believe in the long run the agility of open models will make the black boxes obsolete.

5. Transformer Replacements

At the heart of every current LLM are so-called transformer blocks. Without going into details here, they enable fast training and high performance. However, there are drawbacks - no, they are not stupid, as Yann LeCun, chief scientist of Meta AI, claims. But they are slow when it comes to using them, and they cannot deal with very long inputs. This matters because a long input is required for an LLM to remember a longer discussion, or to be able to feed it a whole book or database as prompt input. It is a very active area of research to replace transformers with other architectures: recently, Microsoft published Retentive Network: A Successor to Transformer for LLMs while other people believe that diffusion models are the future. Diffusion models today dominate image generation (Stable Diffusion, Midjourney) and audio generation and while it is more difficult to apply them to text generation, this is a very promising area of research.

6. Running out of training data?

Recent studies show that we will soon run out of data to train our LLMs. Currently, we already use up to 10% of the high-quality training data available. Considering that we produce new data at a much lower rate than we increase our training data sets, we will likely be using all of it already by 2024. Is this the end of the current practice of improving models every generation by adding more data? Probably not yet: we may become more creative in using low-quality data which will not be exhausted until some time between 2030-2050, researchers estimate.

However, there is another problem: According to some experts, by 2025-2030, more than 99% of the content in the internet will be generated by AI. This may be speculation, but already today a large fraction of all content is created by AI. And the web is the main source of training material for LLMs. This means that current LLMs may be the last generation primarily trained on human-generated content - while future LLMs will be dominantly trained on content created by current LLMs! And there is little way to avoid this, as it is not possible to reliably separate human and AI content. There are claims that this will poison future AI models. In any case, the quality of the training data will decline in the future while the quantity will not increase significantly. Data quality will become an even greater focus in the near future.

7. LLMs and cyber security

Security is a topic rarely discussed in the context of LLMs - except for AIs enslaving mankind, of course. This will change. A recent study on prompt injection concludes that all traditional cyber security threats have counterparts which can affect LLM systems as well. This is because LLMs today are not isolated systems anymore - they can digest input (e.g., from web search) and produce output (e.g., write and send emails on behalf of the user). Phishing, scams, denial-of-service attacks and data leaks, among others, become possible at a new scale. And while traditional web applications are typically heavily guarded, there is no such thing as cyber security in the context of LLMs yet. This will change rapidly.