AI5 min readArs Technica AI

Google announces Gemma 4 open AI models, switches to Apache 2.0 license

P
Redakcja Pixelift0 views
Share
Google announces Gemma 4 open AI models, switches to Apache 2.0 license

Foto: Google

The Apache 2.0 license officially replaces Google's restrictive terms, opening a new chapter in the development of open AI models with the launch of the Gemma 4 family. The Mountain View giant has released four variants of open-weight models designed to challenge market leaders by offering performance comparable to closed systems with significantly fewer parameters. The largest variant, 31B Dense, debuted in third place on the prestigious Arena ranking, trailing only much more powerful units, making it one of the most cost-effective solutions on the market. For developers and creators, optimization for local usage is key. The 26B Mixture of Experts and 31B Dense models are designed to run smoothly on consumer graphics cards (GPUs) after quantization. Meanwhile, the smaller units, Effective 2B and 4B, were created with mobile devices, Raspberry Pi, and Jetson Nano in mind, offering near-zero latency. Gemma 4 introduces native support for function calling, structured JSON data generation, and advanced OCR and chart analysis capabilities. In practice, this means that advanced agentic workflows and high-quality code generation are becoming available in entirely offline environments, without the need to rely on paid cloud services. Abandoning the proprietary license in favor of the Apache 2.0 standard removes the final legal barriers to commercial innovation based on Google technology.

The market for open language models has just undergone a significant reshuffle. A year after the release of the previous generation, Google has officially unveiled Gemma 4 — a new family of open-weights models designed to challenge the leaders of the open-source segment. However, the most important news is not the technical parameters themselves, but a radical change in the approach to licensing. The Mountain View giant is abandoning restrictive, proprietary terms in favor of the widely recognized Apache 2.0 license, opening a completely new chapter in building the ecosystem the company refers to as the "Gemmaverse."

Four variants for local performance

The new family of models was designed with local operation in mind, responding to the growing demand for privacy and the reduction of costs associated with cloud infrastructure. Gemma 4 debuts in four different sizes, optimized for specific hardware applications. The most powerful units are 31B Dense and 26B Mixture of Experts (MoE). They have been configured to run in bfloat16 format on a single Nvidia H100 accelerator with 80 GB of VRAM. While this refers to professional hardware, after applying quantization, these models will easily fit on high-end consumer graphics cards.

The 26B MoE model looks particularly interesting. Thanks to the Mixture of Experts architecture, it activates only 3.8 billion parameters from the total pool of 26 billion during response generation. This translates to a significantly higher number of tokens per second compared to models with a traditional dense structure. On the other hand, the 31B Dense variant focuses on quality and precision, serving as an ideal base for further fine-tuning by developers for specific business or scientific tasks.

Graphics presenting the premiere of Gemma 4 models
Gemma 4 introduces four model variants optimized for local and mobile work.

Mobile revolution and edge optimization

The second pair of models, Effective 2B (E2B) and Effective 4B (E4B), is a proposal aimed directly at mobile devices and the edge computing segment. Google emphasizes that close cooperation with engineers from Qualcomm and MediaTek was key during their design. The goal was to maximize the reduction of RAM and energy consumption, which is critical for smartphones or devices like Raspberry Pi and Jetson Nano.

These models are characterized by exceptionally low latency, which Google describes as "near-zero latency." Compared to the previous generation, Gemma 4 E2B and E4B offer not only better performance but also native support for speech recognition. Furthermore, their context window has been expanded to 128k tokens (while the larger 26B and 31B models offer 256k tokens). This is a huge leap, allowing for the processing of extensive documents directly on the user's device without the need to send data to the cloud.

  • Gemma 31B Dense: Third place in the Arena ranking among open models, high quality of reasoning.
  • Gemma 26B MoE: High speed thanks to the activation of only 3.8B parameters during operation.
  • Gemma E4B: Optimized for mobile devices, high efficiency with low power consumption.
  • Gemma E2B: The lightest model, ideal for embedded systems and simple AI tasks on smartphones.

A new standard of openness: Apache 2.0

The biggest barrier to the adoption of previous versions of Gemma was Google's proprietary license. It contained restrictive clauses regarding acceptable use, which the company could change unilaterally, as well as controversial points regarding the ownership of models trained on synthetic data generated by Gemma. Switching to Apache 2.0 is a strategic move intended to build Google's credibility in the eyes of the open-source community.

The Apache 2.0 license is widely known, liberal, and does not impose burdensome commercial restrictions. Developers gain the certainty that the rules of the game will not change during the project's duration. Google hopes that greater freedom will encourage creators to build advanced "agentic workflow" applications—systems capable of performing tasks autonomously. Gemma 4 is ready for this, offering native support for function calling, structured JSON output, and optimization for code generation, matching the offline capabilities of services such as Claude Code or Gemini Pro.

Gemma 4 on performance charts
New Gemma 4 models achieve high scores in Arena rankings with a significantly lower number of parameters than the competition.

The foundation for Gemini Nano 4

The premiere of the E2B and E4B models also sheds light on the future of artificial intelligence in the Android ecosystem. Google has officially confirmed that the upcoming update of the local Gemini Nano 4 model, present in Pixel phones, will be based on the Gemma 4 architecture. This is important information for developers, as systems prototyped today in the AI Core Developer Preview using E2B and E4B models will be fully compatible with future versions of the operating system.

Gemma 4 offers improved reasoning, mathematical abilities, and better instruction following, based on the same technology as the closed Gemini 3 models.

Support for over 140 languages and significant improvements in optical character recognition (OCR) and chart analysis make Gemma 4 one of the most versatile tools in the hands of programmers. The models are already available for download on Hugging Face, Kaggle, and Ollama, and can also be tested in AI Studio. Although Google promotes local work, it also provides the option to run the new models within the paid Google Cloud infrastructure.

It can be assumed that Google's move will force the competition to take similar steps toward license liberalization. Releasing such powerful tools under the aegis of Apache 2.0 means that the barrier to entry for advanced AI projects based on local hardware drops drastically. Gemma 4 is not just another technical update — it is an attempt to take control of the narrative in the world of open-source AI software, where Meta or smaller, agile startups have been calling the shots until now. Scaling performance while simultaneously reducing the number of parameters shows that Google has learned its lesson in optimization, and developers have just received one of the most powerful arguments for staying in the giant's ecosystem.

Comments

Loading...