The Case for Smaller Models

Dave Findlay
Sep 2
3 min read

LLMs keep getting bigger and so do the problems.

A cartoon of two boxers in a ring, one tall with "LLM" on shorts, the other shorter with "SLM." Both wear gloves, facing off, intense mood.

This year alone, companies will pour over $40 billion into AI infrastructure. McKinsey estimates the total cost of compute scaling could reach $1 trillion by 2030, and $3.3 trillion if everyone tries to build their own stack.

That’s not just unsustainable. It’s unnecessary.

Because while large models are impressive, they’re not always practical. And for most real-world use cases, they might not even be the right tool for the job.

The Limits of Scale

As LLMs scale into the hundreds of billions (or even trillions) of parameters, several hard problems emerge:

Soaring Cost of Compute

Inference and fine-tuning on large models require massive GPU clusters and energy draw, making real-time deployment cost-prohibitive for most organizations.

Latency

Big models are slow. Even with quantization and batching, response time often lags, which makes them less useful in interactive or real-time scenarios.

Data Center Bottlenecks

We’re hitting physical and economic limits in the availability of GPUs, networking, and power. Scaling from here requires massive investment, often with diminishing returns.

Accessibility Gaps

Startups, researchers, and smaller teams get priced out. Innovation becomes concentrated in the hands of those with the deepest pockets.

Environmental Impact

More tokens, more watts. Large-scale training and inference contributes significantly to energy consumption and those costs will only grow with demand.

If this sounds unsustainable, that’s because it is.

So what’s the alternative?

Smaller Models. Smarter Use.

Small language models (SLMs), typically under 10B parameters, are having a moment. And for good reason.

Benefits of SLMs:

Lower cost — Train and run on consumer GPUs or small clusters
Faster inference — Suitable for real-time and edge applications
Easier to deploy — More portable, less dependent on proprietary infrastructure
Fine-tuneable — Adapt to specific domains without retraining giants
Lower energy footprint — A meaningful step toward greener AI

And when paired with the right architecture and high-quality data, SLMs can punch well above their weight.

Small Models Are Making Big Moves

The belief that “bigger is better” in AI is being challenged. In 2025, we’ve seen remarkable progress in compact, efficient small language models (SLMs). Models that are not only cheaper to run, but surprisingly powerful.

Here are some of the most promising developments:

Microsoft Phi-4 Series
The Phi family continues to impress, especially the new Phi-4-Mini-Flash, which delivers up to 10× faster inference and excels at reasoning, all in a footprint small enough for local or edge deployment.
Google Gemma 3n
Expanded into multiple sizes (1B–27B), with the Gemma 3n optimized specifically for devices like laptops and tablets, enabling safer, fine-tunable models at the edge.
OLMo-2 (AI2)
One of the most transparent models to date, fully open, with shared training data, logs, and evaluation tooling. The 32B variant is setting new standards in open research and reproducibility.
Mistral Small & Magistral
These European-built models offer 128K-token context, reasoning capabilities, and a strong open-source roadmap, proving that high-performance doesn’t have to mean high-resource.
Energy-Efficient Research
Studies show architectural changes can reduce energy consumption of small models by up to 90% without compromising performance, making them the obvious choice for sustainable AI.