The Case for Smaller Models
- Dave Findlay

- Sep 2
- 3 min read
LLMs keep getting bigger and so do the problems.

This year alone, companies will pour over $40 billion into AI infrastructure. McKinsey estimates the total cost of compute scaling could reach $1 trillion by 2030, and $3.3 trillion if everyone tries to build their own stack.
That’s not just unsustainable. It’s unnecessary.
Because while large models are impressive, they’re not always practical. And for most real-world use cases, they might not even be the right tool for the job.
The Limits of Scale
As LLMs scale into the hundreds of billions (or even trillions) of parameters, several hard problems emerge:
Soaring Cost of Compute
Inference and fine-tuning on large models require massive GPU clusters and energy draw, making real-time deployment cost-prohibitive for most organizations.
Latency
Big models are slow. Even with quantization and batching, response time often lags, which makes them less useful in interactive or real-time scenarios.
Data Center Bottlenecks
We’re hitting physical and economic limits in the availability of GPUs, networking, and power. Scaling from here requires massive investment, often with diminishing returns.
Accessibility Gaps
Startups, researchers, and smaller teams get priced out. Innovation becomes concentrated in the hands of those with the deepest pockets.
Environmental Impact
More tokens, more watts. Large-scale training and inference contributes significantly to energy consumption and those costs will only grow with demand.
If this sounds unsustainable, that’s because it is.
So what’s the alternative?
Smaller Models. Smarter Use.
Small language models (SLMs), typically under 10B parameters, are having a moment. And for good reason.
Benefits of SLMs:
Lower cost — Train and run on consumer GPUs or small clusters
Faster inference — Suitable for real-time and edge applications
Easier to deploy — More portable, less dependent on proprietary infrastructure
Fine-tuneable — Adapt to specific domains without retraining giants
Lower energy footprint — A meaningful step toward greener AI
And when paired with the right architecture and high-quality data, SLMs can punch well above their weight.
Small Models Are Making Big Moves
The belief that “bigger is better” in AI is being challenged. In 2025, we’ve seen remarkable progress in compact, efficient small language models (SLMs). Models that are not only cheaper to run, but surprisingly powerful.
Here are some of the most promising developments:
Microsoft Phi-4 Series
The Phi family continues to impress, especially the new Phi-4-Mini-Flash, which delivers up to 10× faster inference and excels at reasoning, all in a footprint small enough for local or edge deployment.
Google Gemma 3n
Expanded into multiple sizes (1B–27B), with the Gemma 3n optimized specifically for devices like laptops and tablets, enabling safer, fine-tunable models at the edge.
OLMo-2 (AI2)
One of the most transparent models to date, fully open, with shared training data, logs, and evaluation tooling. The 32B variant is setting new standards in open research and reproducibility.
Mistral Small & Magistral
These European-built models offer 128K-token context, reasoning capabilities, and a strong open-source roadmap, proving that high-performance doesn’t have to mean high-resource.
Energy-Efficient Research
Studies show architectural changes can reduce energy consumption of small models by up to 90% without compromising performance, making them the obvious choice for sustainable AI.
Final thought
The age of ever-larger models is giving way to something more practical, more efficient, and more accessible.
It’s not about squeezing a trillion parameters into your stack.
It’s about building fit-for-purpose models that are:
Light enough to run on real infrastructure
Smart enough to deliver real results
Transparent enough to build trust
And efficient enough to scale sustainably
In the end, smaller doesn’t mean weaker.
It means focused.
Purposeful.
And ready for the real world.
At Fuse, we believe a great data strategy only matters if it leads to action.
If you’re ready to move from planning to execution — and build solutions your team will actually use — let’s talk.




