Artificial intelligence companies like OpenAI are seeking to overcome unexpected delays and challenges in the pursuit of ever-bigger large language models by developing training techniques that use more human-like ways for algorithms to “think”.
A dozen AI scientists, researchers and investors told Reuters they believe that these techniques, which are behind OpenAI’s recently released o1 model, could reshape the AI arms race, and have implications for the types of resources that AI companies have an insatiable demand for, from energy to types of chips.
OpenAI declined to comment for this story. After the release of the viral ChatGPT chatbot two years ago, technology companies, whose valuations have benefited greatly from the AI boom, have publicly maintained that “scaling up” current models through adding more data and computing power will consistently lead to improved AI models.
But now, some of the most prominent AI scientists are speaking out on the limitations of this “bigger is better” philosophy.
Ilya Sutskever, the co-founder of AI labs Safe Superintelligence (SSI) and OpenAI, said recently that results from scaling up pre-training – the phase of training an AI model that use s a vast amount of unlabelled data to understand language patterns and structures – have plateaued.
Sutskever is widely credited as an early advocate of achieving massive leaps in generative AI advancement through t he use of more data and computing power in pre-training, which eventually created ChatGPT. Sutskever left OpenAI earlier this year to found SSI.
“The 2010s were the age of scaling, now we’re back in the age of wonder and discovery once again. Everyone is looking for the next thing,” Sutskever said. “Scaling the right thing matters more now than ever.”
Sutskever declined to share more details on how his team is addressing the issue, other than saying SSI is working on an alternative approach to scaling up pre-training.
Behind the scenes, researchers at major AI labs have been running into delays and disappointing outcomes in the race to release a large language model that outperforms OpenAI’s GPT-4 model, which is nearly two years old, according to three sources familiar with private matters.
The so-called ‘training runs’ for large models can cost tens of millions of dollars by simultaneously running hundreds of chips. They are more likely to have hardware-induced failure given how complicated the system is; researchers may not know the eventual performance of the models until the end of the run, which can take months.
Another problem is large language models gobble up huge amounts of data and AI models have exhausted all the easily accessible data in the world. Power shortages have also hindered the training runs.
To overcome these challenges, researchers are exploring “test-time compute,” a technique that enhances existing AI models during the so-called “inference” phase, or when the model is being used. For example, instead of immediately choosing a single answer, a model could generate and evaluate multiple possibilities in real-time, ultimately choosing the best path forward.
This method allows models to dedicate more processing power to challenging tasks like maths or coding problems or complex operations that demand human-like reasoning and decision-making.
“It turned out that having a bot think for just 20 seconds in a hand of poker got the same boosting performance as scaling up the model by 100000x and training it for 100000 times longer,” said Noam Brown, a researcher at OpenAI.
OpenAI has embraced this technique in their newly released model “o1,” formerly known as Q* and Strawberry. The O1 model can “think” through problems in a multi-step manner, similar to human reasoning.
Reuters