Unraveling the Mystery of In-Context Learning in Large Language Models like OpenAI’s GPT-3 have been making headlines lately for their ability to generate human-like text, from poetry to programming code. These models, such as OpenAI’s GPT-3, are trained using massive amounts of internet data and can predict the text that is likely to come next after a small bit of input text. However, researchers have discovered another curious phenomenon called in-context learning, which allows these models to learn how to accomplish a task after seeing only a few examples, even though they weren’t trained for that specific task.
A recent study conducted by researchers from MIT, Google Brain, Stanford University, and the University of Alberta has revealed an exciting potential for large neural network models. These models, such as OpenAI’s GPT-3, are capable of containing smaller, simpler linear models buried inside them, which can be trained to complete a new task using only information already contained within the larger model.
The research, which explores the phenomenon of in-context learning, has important implications for the field of artificial intelligence. By enabling large models to complete new tasks without the need for costly retraining, researchers could streamline the process of fine-tuning these models for specific applications.
Lead author Ekin Akyürek, a computer science graduate student, emphasizes the efficiency of in-context learning. Instead of collecting domain-specific data and doing complex engineering, models can be fed an input and a few examples to accomplish the desired task.
The research team includes Dale Schuurmans, a research scientist at Google Brain and professor of computing science at the University of Alberta, as well as senior authors Jacob Andreas, the X Consortium Assistant Professor in the MIT Department of Electrical Engineering and Computer Science and a member of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL); Tengyu Ma, an assistant professor of computer science and statistics at Stanford; and Danny Zhou, principal scientist, and research director at Google Brain.
Overall, this research opens up exciting possibilities for large neural network models, which could potentially streamline the process of fine-tuning for specific applications and make them more efficient.
A model within a model
A recent study published in MIT Technology Review suggests that large language models can learn new tasks, even if they have not seen any examples before. Researchers believe that these models can perform in-context learning due to their training process. For instance, GPT-3 was trained by reading vast amounts of text on the internet. Therefore, when someone shows the model examples of a new task, it has likely already seen something similar.
The researchers, led by Akyürek, hypothesized that in-context learners are not just matching previously seen patterns, but they are actually learning to perform new tasks. To test this hypothesis, they used a neural network model called a transformer, which has the same architecture as GPT-3 but has been specifically trained for in-context learning.
The researchers found that the transformer could write a linear model within its hidden states. They proved that this linear model is written somewhere in the earliest layers of the transformer. This means that the transformer can train and update the linear model by implementing simple learning algorithms. In essence, the model can simulate and train a smaller version of itself.
The study challenges the current understanding of how large language models learn. While these models have already been instrumental in many natural language processing tasks, this research shows that they might be capable of even more.
Large Language Models Can Perform In-Context Learning by Simulating Smaller Models, According to MIT Researchers.
Probing hidden layers
In the machine learning research community, large language models are believed to be able to perform in-context learning due to their massive training datasets. However, recent research conducted by MIT’s Ilker Akyürek and his colleagues suggests that these models may actually be learning to perform new tasks rather than just repeating previously seen patterns.
The researchers used a neural network model called a transformer, which has the same architecture as GPT-3, to test their hypothesis. They explored the transformer’s architecture and found that it could write a linear model within its hidden states. Their mathematical evaluations demonstrated that this linear model is written somewhere in the earliest layers of the transformer.
Using probing experiments, the researchers attempted to recover the actual solution to the linear model, and they were able to show that the parameter is written in the hidden states. Building off this theoretical work, the researchers may be able to enable a transformer to perform in-context learning by adding just two layers to the neural network.
According to Mike Lewis, a research scientist at Facebook AI Research who was not involved with this work, “These results are a stepping stone to understanding how models can learn more complex tasks and will help researchers design better training methods for language models to further improve their performance.”
Moving forward, Akyürek plans to continue exploring in-context learning with more complex functions than the linear models they studied in this work. The researchers could also apply these experiments to large language models to see whether their behaviors are also described by simple learning algorithms.
The researchers’ findings could change the way people view in-context learning, as these models are capable of learning new tasks rather than just memorizing patterns. By digging deeper into the types of pretraining data that can enable in-context learning, the research team hopes to further develop models that can complete new tasks without the need for retraining with new data.
Source: MIT