Most AI tools rely on the internet, sending your prompts to remote servers for processing before returning results. This process has always been invisible to users. Google changes that with Gemma 4! Which if configured properly, runs directly on your phone, eliminating the need for constant connectivity.
With a one-time download, everything runs locally on your device, keeping your data private. You can access it through Google AI Edge Gallery App. In this article, we explore how to use the app and what you can build with it without Internet, once it has bee configured locally on your device.
The Gemma 4 family consists of four distinct models, each optimized for various hardware requirements by Google. The E2B version is a low-resource device, while the E4B version has been designed for higher throughput. The larger models are truly impressive; for example, the 31B dense model ranks #3 in terms of all open-source models worldwide, while the 26B MoE model sits at #5, outperforming many larger models.

While these benchmarks are noteworthy, there are many other reasons to appreciate this new generation of artificial intelligence (AI). The entire Gemma 4 family has been engineered to provide capabilities beyond simple chat; it will be able to perform complex logic and facilitate agentic workflows, process word, video, and audio, and use more than 140 different languages.
For devices such as phones, the two edge variants of Gemma 4 (E2B and E4B) have been created specifically for low-resource hardware. These models can handle vision, audio, and text data; include function calls; and be small enough to fit within the storage limitations of mobile platforms.
Read more: Google’s Gemma 4: Hands-On
Google has released their AI Edge Gallery application which works on both Android and iOS platforms. Your smartphone performs all processing tasks without needing any cloud service. The application functions as an open-source software.

The following features of AI Edge Gallery make it essential for our use case:
The Agent Skills feature stands out as an essential element of the system. It marks one of the earliest instances where consumers can use multi-step agentic AI technology which operates entirely offline on their mobile devices.
The ability to run AI on local systems provides multiple benefits which go beyond its aesthetic appeal. The three main advantages of this technology present authentic benefits to users:
The licensing agreement establishes another requirement. Google released Gemma 4 under an Apache 2.0 license which permits businesses to use and modify and build on the models without any usage restrictions.

Most people become confused at this point. The size of a model does not determine its value because larger models do not always outperform smaller ones. The four variants of Gemma 4 include Effective 2B (E2B) and Effective 4B (E4B) and 26B Mixture of Experts and 31B Dense. For phones, you need to use the E2B and E4B systems according to Business Today.
The following provides an essential overview:
The E2B system performs better than other systems for basic operations that show high-performance needs. The E4B system offers better performance than other systems because it handles complex function schemas and multiple function options better than other systems.

You should begin with E2B as your starting point. Switch to E4B when you observe that it fails to handle multi-step reasoning tasks.
Step 1: Go to the Google Play Store (for Android) or Apple Store (for iOS), type in Google AI Edge Gallery and download the app.
Step 2: Open the app. You will be brought to the main menu and see all five modes that you can choose from (AI Chat, Ask an Image, Audio Scribe, Agent Skills, and Prompt Lab).
Step 3: Navigate to the Model Management section and download either Gemma 4 E2B or Gemma 4 E4B. The only time you need to be connected to the internet is when downloading these models; you only must do this once.
Step 4: After downloading, you can turn on airplane mode. From this point on, all functions will work without being connected to the internet.
Here, we’ll be developing the sudoku game using Gemma 4 on Google AI Edge Gallery by selecting the AI Chat feature:
Note: If you want to have more cleanly constructed code from the outset, try using Gemma 4 E4B. Also, should issues arise with functions that have previously worked correctly, simply tell Gemma which function you need trouble with and ask her for help repairing it.
When I prompted E2B model then it just stopped mid-task but after prompting the E4B model, it produced the output. The model gave us html code file with thorough instruction which was quite helpful in case of non-tech users. Though, it could have also shown us a frontend interface which was a little disappointing. Also, since it’s running in offline mode, it’s taking alot of time which shows us the limitation of the model.
Note: You can track precisely which skills were used by the agent after each step. The agent is completely transparent in their actions with you as well.
Results were somewhat varied for multiple agent skill types. For the first query, Map generally provided results where the location looked correct on the map, but it should have been able to detect my location on its own instead of explicitly asking me.
For the second query, it loaded the skill for ‘send-email’ appropriately. After the execution of skill, it showed that message has been sent but it didn’t have any info where it sent the message, which is like a huge drawback. The response time and occasional breakdowns of the ability to complete the task demonstrated that there is still a significant amount of improvement to make within the Use of Agentic AI Type Devices.
When we talk about Gemma 4, it has some limitations as well:
The term “AI on your phone” throughout multiple years described as a basic interface which accessed remote cloud APIs. The system processed your information through a circuitous route which passed through an unprotected server.
Gemma 4 establishes an entirely new connection between two different entities.
Your current pocket device can perform three functions which include transcribing talks and analysing visual content and solving difficult challenges through offline capabilities. Previously, system operation required a complete server facility. Now it requires an app download.
The era of AI running silently on your pocket device, with no server involved, is no longer a research demo.
A. Gemma 4 runs directly on your phone, processing prompts locally after a one-time download, without sending data to external servers.
A. Use E2B for basic tasks with low RAM, and E4B for more complex reasoning and advanced functions on mobile devices.
A. It ensures privacy, works without the internet, and avoids ongoing costs like subscriptions, tokens, or cloud usage fees.