GLM-4.5: Is it China’s Best Agentic AI Model Till Date?

Sarthak Dogra Last Updated : 04 Aug, 2025
8 min read

Don’t count China out of the AI race just yet. While everyone’s been obsessing over ChatGPT and Grok, Chinese tech firms have been quietly cooking up some serious competition. First came Kimi’s K2 and Alibaba’s Qwen3-Coder. Now Z.ai just dropped their latest models: GLM 4.5 and its lighter GLM 4.5 Air version, and they’re packing some serious heat. Early tests put these new models at 3rd and 6th place worldwide, right up there with the big boys like OpenAI and Musk’s Grok. But here’s what really matters – these aren’t just chatbots. They’re built for “agentic” AI, meaning they can actually get stuff done on their own, not just talk about it. Can they actually outsmart the Western AI we’re all used to? The answers might surprise you. Read on to know more.

Meet Z.ai: The Chinese AI Powerhouse

Z.ai, formerly known as Zhipu AI, is a Beijing-based startup that has been building LLMs since 2019. The company has a long-term goal of aligning AGI (Artificial General Intelligence) with human intent. Born out of Tsinghua University, Z.ai is China’s first major player in open-weight LLMs, having released the GLM series (General Language Models) since its early days, which have now found widespread adoption across the world.

Just how wide? Today, more than 700,000 developers use Z.ai’s models. With such a growing presence in international benchmarks, Z.ai is shaping up to be a critical force in the next wave of global AI innovation.

In case the user base doesn’t make its dominance evident, know that Z.ai is backed by heavyweights like Tencent, Alibaba, and Hillhouse Capital, and is now valued at over $2 billion.

So, yes, it is not just another lab chasing benchmarks. It is an AI mammoth, and it now has two new tusks.

The new GLM-4.5 and GLM-4.5 Air

As the company puts it in its blog announcing the arrival of the new LLMs, these are “hybrid reasoning models.” This means they are capable of a “thinking mode for complex reasoning and tool using,” as well as a “non-thinking mode for instant responses.”

GLM 4.5 and GLM 4.5 Air now live on Z.ai
GLM 4.5 and GLM 4.5 Air now live on Z.ai

For context, know that the GLM 4.5 comes as the most potent offering by Z.ai till date, while GLM 4.5 Air is its lightweight sibling. Here is a quick description of the two.

GLM 4.5

With a 355 billion total parameter architecture and 32 billion active parameters, this flagship model is designed for large-scale deployment across reasoning, generation, and multi-agent tasks.

GLM 4.5 Air

A lightweight sibling with 106 billion total parameters and 12 billion active ones, this one is optimized for on-device and smaller-scale cloud inference without sacrificing core capabilities.

Together, these models are capable of handling complex reasoning, tool use, and coding, while being cost-efficient and open-weight. The models come as Z.ai’s answer to OpenAI’s GPT-4o and Anthropic’s Claude 3, and the benchmark scores make this quite evident.

However, just numbers are not what make this release special. It is the “openness and usability” of the new LLMs that is promised at least on paper. Unlike many closed APIs or restricted models, Z.ai has made GLM 4.5 open-source, fine-tunable, and available under flexible licenses (Apache/MIT). This allows companies and developers to own their LLM stack, run it locally, and even modify it for commercial use.

Result – A big hurrah from the dev community!

As for others, here are some key features of the GLM 4.5 family of LLMs to give you a glimpse of what they are capable of.

Key Features of the GLM 4.5 LLMs

A distinct design philosophy has been followed in the making of the new GLM 4.5 family of LLMs. Here is all that’s new they bring to the table.

  1. Dual Thinking Modes for Smarter Use: GLM-4.5 introduces two distinct modes: thinking and non-thinking. The thinking mode handles complex tasks like maths, coding, and logic. It takes time, but it reasons better. The non-thinking mode is fast, perfect for casual replies. This dual-mode setup makes the model flexible, capable of deep analysis when needed and quick answers when not.
  2. Built for Agentic Intelligence: Z.ai’s new models support multi-step reasoning, function calling, and external tool usage. That means they can browse the web, generate slides, or even build websites, all through natural language.
  3. Trained with slime: A Custom RL Engine: To teach real-world skills, Z.ai built slime, a powerful reinforcement learning (RL) system. It separates training from data generation, speeding up the process. Slime supports long, tool-based tasks like software dev and research. It even uses FP8 mixed-precision for faster rollouts. As per Z.ai, this makes GLM-4.5 smarter and more efficient.
  4. Full-Stack Creator: The new Z.ai model can design apps, generate code, and even build interactive games. It works with tools like Claude Code and takes instructions through simple chat. The result? A model that turns ideas into real products – web apps, posters, slides, you name it. It’s coding, simplified.

How to Access GLM 4.5?

How you can access the new GLM 4.5 family depends on how you wish to use it. Here are the 3 ways you can use and access these LLMs:

  1. Direct Access (as Chatbot): You can use the new Z.ai LLMs as chatbots directly on the Z.ai website. Simply select the model from the top-left corner and then enter your prompt to start using it.
  2. API Access: For API access, you can visit Z.ai API by clicking here and use the API guidelines as needed.
  3. Open-Weights: GLM 4.5 open-weight models are available at HuggingFace and ModelScope.

Once you have the access, you can start using GLM 4.5 for your required task. In case you wonder what the LLM has in store for you in terms of performance, here is a quick look at what it can do for content, image, and code generation.

Hands-on with GLM 4.5

To give you a hint of what Z.ai has really come up with, we tried our hands on its new LLMs. Here is what we found across use categories:

Content Generation

To test its content generation skills, I gave the following prompt to GLM 4.5 on Z.ai:

Prompt:Write a 100-word product description for a smart electric bicycle designed for city commuters. Highlight its eco-friendliness, smart features, and portability.

Output:

The LLM was able to generate a pretty decent output, based on the simple and straightforward content generation prompt. It managed to frame a good narrative for the description and even gave the product a name of its own. Hallucination or just a step-ahead, I’ll let you decide.

As a content expert, I would call it a “Good” result – not bad at all and nothing that screams extraordinary.

GLM 4.5 content generation hands-on

Reasoning

I tested the reasoning capabilities of Z.ai’s new model using my favourite, age-old math + physics problem that I first studied during my JEE preparation.

Prompt:Four people, standing on the corner of a square, look at the person on their right corner and move. if all of them are moving at the same speed “s”, will any of them ever meet? if yes, where? Explain your reasoning?

Output:

It failed at first. We fed the prompt to GLM 4.5 on multiple machines just to avoid any isolated issue, only to get the result – syntax error:

GLM 4.5 reasoning response (failure)

It was only when we signed in through one of the machines that the LLM was able to provide the right response, and it did so with complete reasoning, though it took notably long. I am not sure what causes that but apparently you may want to login and check for the ideal responses from GLM 4.5:

GLM 4.5 reasoning response (success)

On the contrary, my go-to LLM ChatGPT 4o was able to answer in under 2 seconds, even proceeding to make an explanatory diagram for it. Here is its output:

ChatGPT Reasoning response

Coding

I used the following prompt to test the coding capabilities of GLM 4.5.

Prompt: Code the Home Page of a website for a real estate developer based in Dubai. Keep it simple, elegant, with a colour theme of White and Beige across. List About Us and Contact Us as the clickable links to other pages on the website at the header

Output:

Fantastic job here by GLM 4.5. It was able to generate the entire home page without a single flaw to be found. It even accounted for the specificities in terms of the colour scheme and the page links at the footer. You can have a glimpse of the code and how the website looks here:

GLM 4.5 Agentic Task - Website design
GLM 4.5 Agentic Task - Website design
GLM 4.5 Agentic Task - Website design

GLM 4.5 Benchmarks

With the new models, Z.ai’s goal was to compete with the leading LLMs in the world, and while it does not lead, it does land a tough blow to the competition.

Here are some of the benchmark performances as proof:

Overall Performance

Based on a total of 12 benchmarks covering “agentic (3), reasoning (7), and Coding (2)” performances of LLMs, Z.ai states that the new GLM 4.5 is ranked 3rd, while its Air version is ranked 6th. This is mighty impressive, considering the list of competitors includes the likes of OpenAI, Anthropic, Google DeepMind, xAI, and other such bigwigs.

GLM 4.5 overall benchmark performance
GLM 4.5 Overall Benchmark Performance

Its benchmark performances are spread across use-cases, including:

Agentic Tasks

GLM 4.5 ‘s agent ability was measured on TAU-bench and BFCL-v3 (Berkeley Function Calling Leaderboard v3). On both benchmarks, GLM-4.5 matches the performance of Claude 4 Sonnet.

For web browsing, the new LLM was evaluated on the BrowseComp benchmark. GLM-4.5 outperformed Claude-4-Opus (18.8%) and came close to o4-mini-high (28.3%) in performance, giving correct answers for 26.4% of all questions.

Agentic performance of Z.ai' new models
GLM 4.5 agentic performance

Reasoning

As Z.ai puts it, its new models’ thinking mode allows them to “solve complex reasoning problems, including mathematics, science, and logical problems.” Here are its performance metrics across benchmarks like MMLU Pro, AIME24, MATH 500, SciCode, and others

Reasoning performance of Z.ai' new models
GLM 4.5 benchmark performance for reasoning

Coding

The GLM 4.5 family was evaluated on the SWE-bench Verified and Terminal Bench for its coding capabilities. It was found that both models excel at both building coding projects from scratch and agentically solving coding tasks in existing projects. A big plus- the LLMs can also be integrated into existing coding toolkits such as Claude Code, Roo Code, and CodeGeex.

You can have a look at their benchmark performances here:

GLM 4.5 benchmark performance for coding
GLM 4.5 benchmark performance for coding

Conclusion

The release of GLM 4.5 and GLM 4.5 Air seems like a brilliantly calculated strike at the heart of AI monopolies. Z.ai has made it clear that advanced performance and openness don’t have to be mutually exclusive. With open-weight models, powerful reasoning capabilities, tool-using intelligence, and robust agentic workflows, the GLM 4.5 family pushes the envelope on what practical LLMs can deliver today.

More importantly, Z.ai isn’t just chasing benchmarks. It’s building an ecosystem, complete with RL infrastructure like slime. That’s what makes GLM 4.5 more than just another number in a leaderboard. It’s a stepping stone toward sovereign AI stacks, something that every nation, enterprise, and builder desperately seeks today.

Technical content strategist and communicator with a decade of experience in content creation and distribution across national media, Government of India, and private platforms

Login to continue reading and enjoy expert-curated content.

Responses From Readers

Clear