Google’s Gemma 4 Goes Apache 2.0, and That Actually Matters

1 0 0

Google’s been iterating on its Gemini models like crazy, but if you want to run something on your own hardware without begging for API credits, you’ve been stuck with the Gemma line. Gemma 3 launched over a year ago, and in AI time that’s practically ancient. Today they’re shipping Gemma 4, and the real news isn’t just the model specs — it’s that they finally ditched the custom Gemma license for Apache 2.0.

That license change is overdue. Developers have been grumbling about the restrictive Gemma license for a while, and Google is smart enough to listen. Apache 2.0 is the gold standard for open-source AI licensing — it grants patent rights, allows commercial use, and doesn’t have weird clauses about “acceptable use” that leave lawyers guessing. This puts Gemma 4 on the same footing as Llama and Mistral, which is exactly where it needs to be if Google wants developers to actually build on top of it.

As for the models themselves: four sizes, all designed for local deployment. The headline numbers are a 26B Mixture of Experts variant and a 31B Dense model. The MoE one is the interesting case — it only activates 3.8 billion parameters during inference, which means it’s fast as hell on tokens per second. Google claims they optimized for latency specifically, and that makes sense if you’re trying to run this on a single 80GB H100. Sure, that’s a $20,000 card, but it’s still local. Not everyone’s running a datacenter.

The 31B Dense is the quality play. It’s slower but more capable, and Google clearly expects people to fine-tune it for specific use cases. That’s the right call — raw model quality matters less than how well you can adapt it to your data.

Both big variants run unquantized in bfloat16 on that single H100. If you quantize them down, they’ll fit on consumer GPUs. I’d like to see real-world benchmarks on an RTX 4090 or even a 3090 before getting too excited, but the direction is right. Local AI that actually works on hardware people own is the goal, and Gemma 4 seems aimed squarely at that.

There are also smaller models in the family that Google didn’t talk up as much, presumably for edge devices or laptops. The full lineup should be in the model card on Hugging Face by now.

My take: the Apache 2.0 switch is the bigger story here. Models come and go, but licensing determines whether anyone can actually build a business on top of yours. Google’s been playing catch-up with Meta on this front, and this move signals they’re serious about developer adoption. The model quality will speak for itself once people start testing, but at least now nobody has to worry about getting sued for using it wrong.

Comments (0)

Be the first to comment!