OpenAI marks paradigm shift in artificial intelligence development

24.09.2024.

Foto: Shutterstock

Last year they introduced GPT-4, making a major AI breakthrough by scaling their models to incredible proportions. Now they are announcing a different approach – a smarter model that can reason, but isn’t much larger than its predecessors

The new model, called OpenAI o1, can solve problems that confound existing AI models, including OpenAI’s most powerful existing model, GPT-4o. Instead of answering in one step, as large language models tend to do, it practically “thinks out loud” about the problem, as a human would, before arriving at the correct result.

“This is what we consider the new paradigm in these models,” Mira Murati, OpenAI’s chief technology officer, told Wired. “It is much better at tackling very complex reasoning tasks.” The new model was known within OpenAI under the code name Strawberry, and the company says that it is not a successor to GPT 4o, but rather its complement.

New paradigm in AI development

Murati says OpenAI is currently working on the next major model, the GPT-5, which will be significantly larger than its predecessor. While the company still believes that scaling up will help achieve new AI capabilities, the GPT-5 will likely include the new reasoning technology unveiled on September 12.

There are two paradigms. The scaling paradigm and this new paradigm. We expect that we will bring them together.
Mira Murati, CTO, OpenAI

Large language models (LLMs) typically get answers from huge neural networks that are trained on large amounts of data. They may show outstanding language and logical abilities, but traditionally have difficulty with simple problems such as basic math questions that require reasoning.

Reinforcement learning

OpenAI o1 uses reinforcement learning, which involves giving the model positive feedback when it gets a correct answer and negative feedback when it fails, to improve its reasoning process. Murati claims that “the model sharpens its thinking and fine tunes the strategies that it uses to get to the answer.” Reinforcement learning has enabled computers to play games with superhuman skills (OpenAI beat the world champions in DOTA 2 back in 2019) and perform useful tasks like c o m p u t e r c h i p design. Engineering is also a key ingredient for turning large language models into useful and intuitive chatbots.

Mark Chen, vice president of research at OpenAI, demonstrated the new model to WIRED, using it to solve several problems that its prior model, GPT-4o, could not. These included an advanced chemistry question and a complicated mathematical puzzle.

The [new] model is learning to think for itself, rather than kind of trying to imitate the way humans would think.

OpenAI says its new model performs significantly better on a number of problem sets, including tasks focused on coding, math, physics, biology and chemistry.

What’s happening on the other side of the fence?

Improving the reasoning ability of large language models (LLM) has been a hot topic in research circles for some time. Moreover, competitors follow similar lines of research. In July, Google announced AlphaProof, a project that combines language models with reinforcement learning to solve difficult mathematical problems.

AlphaProof was able to learn how to reason logically about math problems by looking at the correct answers. A key challenge in expanding this type of learning is that there are no correct answers for everything a model might encounter. Chen says that OpenAI has been able to build a reasoning system that is much more general. “I do think we have made some breakthroughs there; I think it is part of our edge,” says Chen. “It’s actually fairly good at reasoning across all domains.”

Noah Goodman, a Stanford professor who has published work on improving the reasoning abilities of LLMs, says the key to more generalized training may involve using a “carefully prompted language model and handcrafted data” for training. He adds that being able to consistently trade the speed of results for greater accuracy would be a “nice advance.”

OpenAI’s Chen says the new reasoning approach developed by the company shows that advances in artificial intelligence don’t have to cost incredible amounts of computing power.

One of the exciting things about the paradigm is we believe that it’ll allow us to ship intelligence cheaper […] and I think that really is the core mission of our company.
Mark Chen, VP Research, OpenAI