In 2025, several AI coding assistants have been released, which can be accessed directly from the terminal. Codex CLI, Gemini CLI, and Claude Code are some of the popular names that embed large language models into command-line workflows. These programming tools that can generate and fix code via natural language prompts are truly incredible. We document our evaluation of all three of these across different tasks to determine which is most useful.
Each assistant is based on a sophisticated AI model like o4-mini, Gemini 2.5 Pro, or Claude Sonnet 4 to enhance productivity. We place each one in the same environment and test them with specific metrics on realistic programming tasks. Varying from web development to data analysis, through this, we aim to make the strengths of each agent clear!
The command line is quickly becoming a battleground for the next generation of AI coding assistants. Companies, including OpenAI, Google, and Anthropic, have released advanced CLI-based AI coding assistants, each with very powerful and impressive capabilities directly into the terminal. But what are the differences, and which is best for your workflow? Let’s go over the tools.
Codex CLI functions like a smart terminal assistant for coding. It listens to what you say to it and creates code. Codex CLI has access to your shell and file system. It can scaffold a project, write a function, and fix a bug. Codex CLI is utilizing OpenAI’s Codex models in the background. You use plain English to tell Codex CLI what code you would like for a task. Then the CLI suggests new code and files. Codex CLI supports several languages, including Python, JavaScript, and Go.

Gemini CLI by Google brings together the strengths of the Gemini 2.5 Pro model with access to the terminal and filesystem in order to create an uninterrupted coding and utility assistant for developers. It can be used for much more than simple code generation. Gemini CLI is adept at completing tasks in real time, such as obtaining live information or running shell commands. Developed on the Google infrastructure and integrated with various tools such as VS Code AI, Gemini CLI provides utility across terminals and IDEs.

Claude Code is a leading coding AI made for high-performance terminal workflows. It is based on Claude Sonnet 4 and can easily handle end-to-end software development functions. Such as writing new modules to running tests, to automatically creating pull requests. Claude Code aims to provide depth, consistency, and qualified codebase navigation. While it is skill-based and closed-sourced. So if you are a professional software developer looking for AI that can understand and evolve large, complex projects, Claude Code is for you.

| Feature | Codex CLI | Gemini CLI | Claude Code |
|---|---|---|---|
| Model Backbone | OpenAI Codex (o4-mini) | Gemini 2.5 Pro | Claude Sonnet 4 |
| Context Window | 128K tokens | 1 million tokens | ~200K tokens (approx) |
| Installation | npm install codex-cli | npm install @google/gemini | npm install claude |
| License Type | Commercial OpenAI terms | Open-source (Apache 2.0) | Commercial, subscription-based |
| Local File System Access | Yes | Yes | Yes |
| Shell Command Execution | Native via shell integration | Native | Native |
| Unique Capability | Fastest response time | Real-time web search + command | Full codebase mapping & PR generation |
| Ideal For | Developers needing rapid iteration | Balanced dev + utility workflows | Advanced team development |
| Web Integration | No live web search | Integrated Google Search | None – code-focused only |
Testbed & Environment: All the CLI-based AI coding assistants were tested using a local workstation running Ubuntu 24.04. The agents Codex CLI (based on OpenAI’s o4-mini), Gemini CLI (Gemini 2.5 Pro), and Claude Code (Claude Sonnet 4) were installed via npm or pip. Codex CLI and Claude required Node.js and valid API keys. Gemini CLI required a Google login for authentication.
Evaluation Metrics That Matter: We evaluated each agent based on five criteria:
These measures test not just performance, but how usable and reliable a developer can expect the agents to be in a real workflow.
Real-World Tasks Used in the Battle: Each agent was tasked with three tasks to test versatility:
Goal: Build a basic 2D Mario-style game
Prompt: “Create a basic 2D Super Mario-style platformer game. The game should feature a simple tile-based layout with Mario standing on ground blocks, a background sky with clouds, a question mark block above him, and a green pipe nearby. Include basic mechanics like left/right movement and jumping using keyboard arrow keys. Simulate gravity and collision with platforms. Use pixel-art style graphics with embedded or referenced local assets.”
Gemini CLI:
Codex CLI:
Claude Code:
Claude Code excels in game handling logic from both Codex and Gemini. It shows consistent controls, gravity, and collision, and delivers the most immersive gameplay experience.
Goal: Build a clock UI with live weather updates
Prompt: “Design and develop a visually rich weather-themed dynamic clock dashboard using only HTML, CSS, and JavaScript. The main goal is to create a real-time clock interface that not only displays the current time but also visually adapts to the time of day. Implement four animated background transitions representing sunrise, noon, sunset, and night, each with unique colors and animated elements like moving clouds, twinkling stars, or a rising/setting sun/moon, and offer a toggle between 12-hour and 24-hour time formats. For an added layer of interactivity, include a section that displays a rotating motivational or productivity quote based on the hour.”
Gemini CLI:
Codex CLI:
Claude Code:



To summarize, Claude Code was ahead in UI logic and the overall user experience. It brought together sound functionality, engaging visual transitions, interactive elements, and flow in the user interface. Codex delivered on the basic functional requirements but lacked the UX, and Gemini had a moderate visual design but very low dynamism.
Goal: Clean, analyze, and visualize a dataset
Prompt: “Perform Data Analysis and Exploratory Data Analysis (EDA) on the dataset provided in the same directory. The entire analysis should be implemented and stored in a Jupyter Notebook file named eda.ipynb. Begin by loading the dataset and inspecting its structure, including column names, data types, and summary statistics. Proceed to clean the data by handling missing values, correcting data types if necessary, and removing any duplicates. Conduct univariate analysis to understand individual features, and then perform bivariate and multivariate analysis to uncover relationships between variables. Use clear and relevant visualizations to support your insights. Organize the notebook with proper Markdown headings and explanations for each step. Conclude with at least three key observations or insights drawn from the data.”
Gemini CLI:
Codex CLI:
Claude Code:
Claude Code is the one for EDA and data analysis. It not only completes the full analytical workflow but also organizes the outputs nicely and delivers well-structured insights useful for both single-user data work and team-based environments. Codex could be a useful backup; however, Gemini CLI is not appropriate for this.
Claude Code gives a clear structure and documentation, and is good to execute. It handled the game logic and error handling without issue. Codex CLI was fast and flexible, but required some manual intervention. Gemini CLI gave a firm foundation and seemed fast. Its polish and documentation were lacking; it suffered the most in the EDA assignment, missing core outputs and structural completeness.
In speed, Codex CLI was fastest, followed by Gemini and Claude. Claude was the easiest for prompt engineering. Each CLI was suited well to specific workflows. Claude was strong on logic-heavy work, Codex would be best in speed-focused workflows, and Gemini was suitable for basic structured implementations lacking refinement.
Claude Code was the best across all tasks, providing the best quality code, user experience, and complete range of features. While it was not the fastest AI coding assistant, its finished products were polished, documented, organized, and ideal for professional workflows with a lot of trust involved. Codex CLI was the fastest, and a great choice using to creating quick prototypes or if there was a time constraint on the coding work.
Gemini CLI was reasonable for basic builds, but had issues with not being fast, polished, or organized for many kinds of work. It had issues with data analysis tasks that required organized or insightful content. Overall, all tools have different fits, but Claude Code provides the most consistent depth when it comes to being a command-line AI coding assistant.
A. A CLI (Command-Line Interface) AI assistant allows users to interact with an AI model directly through the terminal, automating tasks like coding, debugging, and content generation using natural language prompts.
A. Codex CLI offers the fastest response times, followed by Gemini CLI, with Claude Code being the slowest of the three. However, speed comes at the cost of polish and completeness in many cases.
A. Claude Code demonstrated superior development capabilities, creating the most playable and visually appealing Super Mario-style game with proper physics, collision detection, and interactive elements like mystery boxes.
A. Yes, all three tools have local file system access and can work with existing projects. Claude Code particularly excels at understanding and navigating large, complex codebases.
A. Claude Code offers the most balanced performance across tasks, especially for professional-grade projects, but it isn’t the fastest.