The market for open language models has just undergone a significant reshuffle. A year after the release of the previous generation, Google has officially unveiled Gemma 4 — a new family of open-weights models designed to challenge the leaders of the open-source segment. However, the most important news is not the technical parameters themselves, but a radical change in the approach to licensing. The Mountain View giant is abandoning restrictive, proprietary terms in favor of the widely recognized Apache 2.0 license, opening a completely new chapter in building the ecosystem the company refers to as the "Gemmaverse."

Four variants for local performance

The new family of models was designed with local operation in mind, responding to the growing demand for privacy and the reduction of costs associated with cloud infrastructure. Gemma 4 debuts in four different sizes, optimized for specific hardware applications. The most powerful units are 31B Dense and 26B Mixture of Experts (MoE). They have been configured to run in bfloat16 format on a single Nvidia H100 accelerator with 80 GB of VRAM. While this refers to professional hardware, after applying quantization, these models will easily fit on high-end consumer graphics cards.

The 26B MoE model looks particularly interesting. Thanks to the Mixture of Experts architecture, it activates only 3.8 billion parameters from the total pool of 26 billion during response generation. This translates to a significantly higher number of tokens per second compared to models with a traditional dense structure. On the other hand, the 31B Dense variant focuses on quality and precision, serving as an ideal base for further fine-tuning by developers for specific business or scientific tasks.

Graphics presenting the premiere of Gemma 4 models — Gemma 4 introduces four model variants optimized for local and mobile work.

Mobile revolution and edge optimization

The second pair of models, Effective 2B (E2B) and Effective 4B (E4B), is a proposal aimed directly at mobile devices and the edge computing segment. Google emphasizes that close cooperation with engineers from Qualcomm and MediaTek was key during their design. The goal was to maximize the reduction of RAM and energy consumption, which is critical for smartphones or devices like Raspberry Pi and Jetson Nano.

These models are characterized by exceptionally low latency, which Google describes as "near-zero latency." Compared to the previous generation, Gemma 4 E2B and E4B offer not only better performance but also native support for speech recognition. Furthermore, their context window has been expanded to 128k tokens (while the larger 26B and 31B models offer 256k tokens). This is a huge leap, allowing for the processing of extensive documents directly on the user's device without the need to send data to the cloud.

Gemma 31B Dense: Third place in the Arena ranking among open models, high quality of reasoning.
Gemma 26B MoE: High speed thanks to the activation of only 3.8B parameters during operation.
Gemma E4B: Optimized for mobile devices, high efficiency with low power consumption.
Gemma E2B: The lightest model, ideal for embedded systems and simple AI tasks on smartphones.

A new standard of openness: Apache 2.0

The biggest barrier to the adoption of previous versions of Gemma was Google's proprietary license. It contained restrictive clauses regarding acceptable use, which the company could change unilaterally, as well as controversial points regarding the ownership of models trained on synthetic data generated by Gemma. Switching to Apache 2.0 is a strategic move intended to build Google's credibility in the eyes of the open-source community.

The Apache 2.0 license is widely known, liberal, and does not impose burdensome commercial restrictions. Developers gain the certainty that the rules of the game will not change during the project's duration. Google hopes that greater freedom will encourage creators to build advanced "agentic workflow" applications—systems capable of performing tasks autonomously. Gemma 4 is ready for this, offering native support for function calling, structured JSON output, and optimization for code generation, matching the offline capabilities of services such as Claude Code or Gemini Pro.

Gemma 4 on performance charts — New Gemma 4 models achieve high scores in Arena rankings with a significantly lower number of parameters than the competition.

The foundation for Gemini Nano 4

The premiere of the E2B and E4B models also sheds light on the future of artificial intelligence in the Android ecosystem. Google has officially confirmed that the upcoming update of the local Gemini Nano 4 model, present in Pixel phones, will be based on the Gemma 4 architecture. This is important information for developers, as systems prototyped today in the AI Core Developer Preview using E2B and E4B models will be fully compatible with future versions of the operating system.

Gemma 4 offers improved reasoning, mathematical abilities, and better instruction following, based on the same technology as the closed Gemini 3 models.

Support for over 140 languages and significant improvements in optical character recognition (OCR) and chart analysis make Gemma 4 one of the most versatile tools in the hands of programmers. The models are already available for download on Hugging Face, Kaggle, and Ollama, and can also be tested in AI Studio. Although Google promotes local work, it also provides the option to run the new models within the paid Google Cloud infrastructure.

It can be assumed that Google's move will force the competition to take similar steps toward license liberalization. Releasing such powerful tools under the aegis of Apache 2.0 means that the barrier to entry for advanced AI projects based on local hardware drops drastically. Gemma 4 is not just another technical update — it is an attempt to take control of the narrative in the world of open-source AI software, where Meta or smaller, agile startups have been calling the shots until now. Scaling performance while simultaneously reducing the number of parameters shows that Google has learned its lesson in optimization, and developers have just received one of the most powerful arguments for staying in the giant's ecosystem.

Google announces Gemma 4 open AI models, switches to Apache 2.0 license

Four variants for local performance

Read also

Mobile revolution and edge optimization

A new standard of openness: Apache 2.0

The foundation for Gemini Nano 4

More from AI

Cisco CEO Chuck Robbins wants data centers in space

How to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and others

Spain’s Xoople raises $130 million Series B to map the Earth for AI

Copilot is ‘for entertainment purposes only,’ according to Microsoft’s terms of use

Related Articles

“The problem is Sam Altman”: OpenAI Insiders don’t trust CEO

Google quietly launched an AI dictation app that works offline

Iran threatens ‘Stargate’ AI data centers

Iran threatens OpenAI’s Stargate data center in Abu Dhabi

Comments