Google has introduced Gemma 4, a new family of open AI models designed for advanced reasoning and agentic workflows.

We just released Gemma 4 — our most intelligent open models to date.

Built from the same world-class research as Gemini 3, Gemma 4 brings breakthrough intelligence directly to your own hardware for advanced reasoning and agentic workflows.

Released under a commercially… pic.twitter.com/W6Tvj9CuHW

— Google (@Google) April 2, 2026

“Gemma 4 — our most intelligent open models to date. They deliver an unprecedented level of intelligence per parameter,” the announcement stated.

Since the launch of the first generation, developers have downloaded Gemma over 400 million times, creating more than 100,000 model variants within the Gemmaverse ecosystem. The latest version is built on the same research and technology as the Gemini 3 chatbot.

Different Sizes

The Gemma 4 family includes four versions: Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts (MoE), and 31B Dense.

The compact E2B and E4B models, with 2.3 billion and 4.5 billion active parameters respectively, focus on multimodality, low latency, and seamless integration. They can run on smartphones or standard laptops.

The 26B MoE and flagship 31B models, with 26 billion and 31 billion parameters, require an Nvidia H100 GPU with 80 GB of memory. These models are optimized for researchers and developers.

The larger versions perform exceptionally well in benchmarks. In the global ranking of open text models by Arena AI, the flagship 31B ranks third, while the 26B ranks sixth. According to the developers, the new lineup outperforms competitors' models that are 20 times larger.

Source: Google.

Key Features

One of the main advantages of Gemma 4 is its advanced reasoning capabilities. The models can construct complex logic and plan multi-step tasks. They show significant progress in mathematical benchmarks and follow instructions accurately.

Other features include:

  • Agentic Workflows — Built-in support for function calls, structured JSON output, and system instructions enables the creation of autonomous assistants that interact with tools and APIs;
  • Code Generation — Gemma 4 supports high-quality code writing in offline mode, turning a workstation into a local AI assistant;
  • Vision and Audio — All models process video and images at variable resolutions, recognize text, and analyze charts. E2B and E4B also support speech recognition and understanding;
  • Extended Context Window — The compact versions support 128,000 tokens, while the larger ones can handle up to 256,000. This is sufficient for processing entire repositories or large documents in a single query;
  • Multilingual Support — The model family can work with over 140 languages.

Gemma 4 is already available in Google AI Studio and Google AI Edge Gallery. Integration is also supported by popular third-party tools and frameworks, including Hugging Face, vLLM, llama.cpp, MLX, Ollama, NVIDIA NIM, and LM Studio.

Models can be customized through Google Colab, Vertex AI, or on local GPUs. For production, deployment is available on Google Cloud, including Cloud Run, GKE, and Sovereign Cloud.

As a reminder, in early April, Google introduced a new AI model for video generation — Veo 3.1 Lite.