The latest set of open-source models from Google are here, the Gemma 4 family has arrived. Open-source models are getting very popular recently due to privacy concerns and their flexibility to be easily fine-tuned, and now we have 4 versatile open-source models in the Gemma 4 family and they seem very promising on paper. So without any further ado let’s decode and see what the hype is all about.
Gemma is a family of lightweight, open-weight large language models developed by Google. It’s built using the same research and technology that powers Google’s Gemini models, but designed to be more accessible and efficient.
What this really means is: Gemma models are meant to run in more practical environments, like laptops, consumer GPUs and even mobile devices.
They come in both:
So these are the models that come under the umbrella of the Gemma 4 family:
The E2B and E4B models feature a 128K context window, while the larger 26B and 31B feature a 256K context window.
Note: All the models are available both as base model and ‘IT’ (instruction-tuned) model.
Below are the benchmark scores for the Gemma 4 models:

Gemma 4 is released under Apache 2.0 license, you can freely build with the models and deploy the models on any environment. These models can be accessed using Hugging Face, Ollama and Kaggle. Let’s try and test the ‘Gemma 4 26B A4B IT’ through the inference providers on Hugging Face, this will give us a better picture of the capabilities of the model.
Hugging Face Token:

I’ll be using Google Colab for the demo, feel free to use what you like.
from getpass import getpass
hf_key = getpass("Enter Your Hugging Face Token: ")
Paste the Hugging Face token when prompted:

Let’s try to create a frontend for an e-commerce site and see how the model performs.
prompt="""Generate a modern, visually appealing frontend for an e-commerce website using only HTML and inline CSS (no external CSS or JavaScript).
The page should include a responsive layout, navigation bar, hero banner, product grid, category section, product cards with images/prices/buttons, and a footer.
Use a clean modern design, good spacing, and laptop-friendly layout.
"""
Sending request to the inference provider:
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
api_key=hf_key,
)
completion = client.chat.completions.create(
model="google/gemma-4-26B-A4B-it:novita",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": prompt,
},
],
}
],
)
print(completion.choices[0].message)

After copying the code and creating the HTML, this is the result I got:


The output looks good and the Gemma model seems to be performing well. What do you think?
The Gemma 4 family not only looks promising on paper but in results too. With versatile capabilities and the different models built for different needs, the Gemma 4 models have got so many things right. Also with open-source AI getting increasingly popular, we should have options to try, test and find the models that better suit our needs. Also it’ll be interesting to see how devices like mobiles, Raspberry Pi, etc benefit from the evolving memory-efficient models in the future.
A. E2B means 2.3B effective parameters. While total parameters including embeddings reach about 5.1B.
A. Large embedding tables are used mainly for lookup operations, so they increase total parameters but not the model’s effective compute size.
A. Mixture of Experts activates only a small subset of specialized expert networks per token, improving efficiency while maintaining high model capacity. The Gemma 4 26B is a MoE model.