Why Smaller AI Models Are Becoming a Big Deal

SBS Swiss Business School > CSCFS News > Foresight > Why Smaller AI Models Are Becoming a Big Deal

In the ever-evolving world of artificial intelligence, size matters—but not always in the way you think. When OpenAI unveiled GPT-3 in 2020, it was the largest language model ever built, and its unprecedented scale propelled AI performance to new heights. This breakthrough marked the beginning of an era where “bigger is better” became the mantra for AI development. As OpenAI researcher Noam Brown remarked at TEDAI San Francisco in October, “The incredible progress in AI over the past five years can be summarized in one word: scale.”

However, the landscape is shifting. As the performance gains from scaling up massive models begin to plateau, researchers are pivoting to a new frontier: doing more with less. Smaller, purpose-built models are proving to be just as effective as their larger counterparts for certain tasks, especially when trained on highly specific datasets. This shift is a game-changer for businesses looking to leverage AI in targeted ways. For instance, you don’t need a model trained on the entirety of the internet if your needs center around repetitive, domain-specific queries.

Tech giants have taken note. OpenAI, for example, now offers “fun-size” versions of its flagship models, such as GPT-4o and GPT-4o mini. Google DeepMind has followed suit with Gemini Ultra and Gemini Nano, while Anthropic provides Claude 3 in options ranging from the expansive Opus to the compact Haiku. Microsoft is also leading in this space, developing a suite of small language models under the Phi label.

It’s not just the tech titans embracing this trend. Startups like Writer are making headlines with compact models that rival the performance of top-tier behemoths despite having as little as a twentieth of the parameters. Parameters, the mathematical values computed during training, determine a model’s behavior, and smaller models demonstrate that efficiency, not sheer scale, can drive exceptional outcomes.

The advantages of smaller models are compelling. They are faster to train and deploy, significantly reducing costs—a boon for businesses and startups alike. Their energy efficiency offers environmental benefits, using a fraction of the computational power required by larger models, which translates to a smaller carbon footprint. Moreover, these compact models are portable, capable of running on local devices without relying on cloud infrastructure, bringing AI closer to users than ever before.

As the AI industry matures, the focus on scale is giving way to a more nuanced understanding of performance and utility. Small is no longer a compromise—it’s an innovation. The era of giant models dominating the field may be giving way to a future where smaller, smarter models lead the charge.