Is there something Qwen models can’t do? So far, their text and coding models are topping most of the charts and arenas. That is why Alibaba’s Qwen team got onto the “creative” side. They have just released “Qwen-Image” – a native text rendering image generation model designed to challenge the supremacy of GPT-4.1, DALL-E 2, or Midjourney. The best part? It’s Free, and what’s even better is that it is accessible for everyone! In this blog, we will provide you with all the details about Qwen-Image, including how to access it, its performance, applications, and more.
Let’s check if the Qwen-Image is “Qwen-tastic” or not!
Qwen Image is the latest Image generation model by Alibaba’s Qwen team. It’s a 20 B MMDiT image foundation model, meaning that the model consists of 20 billion parameters and is a multimodal diffusion transformer model. Qwen-Image is an open-weight text-to-image generation model that currently ranks 5th on the Artificial Analysis Image Arena Leaderboard and is the only open-weight model to be present in the top 10 list!

The Qwen-Image model follows an approach that was last seen in OpenAI’s GPT-4o. It utilizes an autoregressive transformer architecture for image generation and editing. To do this, the model takes a dual encoding approach:
You can read the full technical report of the Qwen-Image model here.
Some of the key highlights that make Qwen-Image stand apart are:
These features, along with the excellent performance of this model, have been showcased on various benchmarks- making Qwen-Image a formidable image generation model.
To access the Qwen-Image model through Chat,

3. Below the text box, in the middle of the screen, select “Image Generation”. Enter your prompt in the text box and get started! You can access the models in other ways, like:
Now that we have covered a lot of details about Qwen-Image, let’s test it for 3 main tasks:
Let’s start with each of them one by one:
Prompt: “Create a visually engaging landing page for a shampoo product. Highlight the shampoo’s unique features (e.g., hydration, repair, or natural ingredients) with a clean and modern design. Include a hero section with the shampoo bottle image, a catchy headline like ‘Transform Your Hair Today,’ and a call-to-action button (‘Shop Now’ or ‘Learn More’). Add sections for benefits, key ingredients, customer testimonials, and a subscription option. Use soft, fresh colors, high-quality visuals, and ensure the layout is mobile-friendly and conversion-focused.”

The generated image was good; it had a lot of the text that I had asked to be incorporated. It captured the essence of the prompt well and designed the entire image appropriately. But there were a few misses. Although spellings were correct, at one place a word was incomplete, and some words that I had mentioned were not incorporated. I liked the colour theme that the model chose for this task.
Prompt: “ Design a clear, modern infographic that explains the image generation process of a 20B MMDiT foundation model in 3 steps:
Use icons, arrows, and short labels for each step. The flow should be visually logical and easy to follow, with a tech-inspired color palette.”
Output:

I did not like the output at all. The text was missing in some places and completely vague at other places. The icons and overall image felt a bit disoriented. The flow from step 1 to 2 to 3 was there, but the image is quite unclear.
Input image:

Prompt: “Change the night into a sunny morning, replace the man’s clothes with an orange shirt and white shorts, and replace the cat with a small puppy.”

This result was just perfect. Literally Perfect. All the changes that I had asked for happened in the image. The lighting was suitable, the clothes and the animal were all changed. A minor issue: while the model replaced night with day, it didn’t remove the moon, although it made it look like a round cloud. A very well edited image that took just a few seconds to generate!
Overall, I really liked the editing capabilities of the model, but the image generation, especially incorporating a large amount of text or designing infographics, is where Qwen-Image would need a lot of improvement going forward – especially if it wants to compete with the likes of OpenAI, Google, or X.

But it has one really cool feature that most of the top models do not. You can actually select the frame size that you wish to work with, right from the text box! If you are a content creator, this really would help you to create the “right-sized” image for each of your social media platforms.
Now that we have tested the model, let’s look at the results that the Qwen team has released for the performance of the Qwen-Image model against its counterparts:
For Image Generation & Editing Benchmarks

For Text Rendering Benchmarks

Qwen models are currently ruling the leaderboards for text and coding-based tasks. Qwen-Image holds similar promise but is not quite there yet. The model adheres to prompts but struggles with huge context. But it’s a great gift to the open-source community. It competes with the top-paid models while being completely open-weight. As users and developers use Qwen-Image more and more, we can soon expect the Qwen-Image model to lead the Image Generation Analysis too!
My final thought – try the Qwen-Image Model. It’s good, we are just surrounded by a lot of great models to not realise its potential.
You can also read about Finding the Best AI Image Generation Model.
If you want to read about other FREE image generation models, you can refer to the following blog: Top 7 AI Image Generators to Try in 2025.