The last time OpenAI’s ChatGPT introduced an image generation model, it quickly went viral across the internet. People were captivated by the ability to create Ghibli-style portraits of themselves, turning personal memories into animated artwork. Now, ChatGPT is taking things a step further with a new natively multimodal model “gpt-image-1” which powers image generation directly within ChatGPT and is now available via API. In this article we will explore the key features of OpenAI’s gpt-image-1 model and how to use it for image generation and editing.
gpt-image-1 is the latest and most advanced multimodal language model from OpenAI. It stands out for its ability to generate high-quality images while incorporating real-world knowledge into the visual content. While gpt-image-1 is recommended for its robust performance, the image API also supports other specialized models like DALL·E 2 and DALL·E 3.
The Image API offers three key endpoints, each designed for specific tasks:
Also Read: Imagen 3 vs DALL-E 3: Which is the Better Model for Images?
gpt-image-1 offers several key features:
The OpenAI API enables users to generate and edit images from text prompts using the GPT Image or DALL·E models. At present, image generation is accessible exclusively through the Image API, though support for the Responses API is actively being developed.
To read more about gpt-image-1 click here.
Before diving into how to use and deploy the model, it’s important to understand the pricing to ensure its effective and budget-conscious usage.
The gpt-image-1 model is priced per token, with different rates for text and image tokens:
In practical terms, this roughly equates to:
For more detailed pricing by image quality and resolution, refer to the official pricing page here.
Note: This model generates images by first creating specialized image tokens. Therefore, both latency and overall cost depend on the number of tokens used. Larger image dimensions and higher quality settings require more tokens, increasing both time and cost.
To generate the API key for gpt-image-1:
For this, first, visit: https://platform.openai.com/settings/organization/general. Then, click on “Verify Organization” to start the verification process. It is quire similar to any KYC verification, where depending on the country, you’ll be asked to upload a photo ID, and then verify it with a selfie.
You may follow this documentation provided by Open AI to better understand the verification process.
Also Read: How to Use DALL-E 3 API for Image Generation?
Finally it’s time to see how we can generate images using the gpt-image-1 API.
We will be using the image generation endpoint to create images based on text prompts. By default, the API returns a single image, but we can set the n parameter to generate multiple images at once in a single request.
Before running our main code, we need to first run the code for installation and setting up the environment.
!pip install openai
import os
os.environ['OPENAI_API_KEY'] = "<your-openai-api-key>"
Now, let’s try generating an image using this new model.
Input Code:
from openai import OpenAI
import base64
client = OpenAI()
prompt = """
A serene, peaceful park scene where humans and friendly robots are enjoying the
day together - some are walking, others are playing games or sitting on benches
under trees. The atmosphere is warm and harmonious, with soft sunlight filtering
through the leaves.
"""
result = client.images.generate(
model="gpt-image-1",
prompt=prompt
)
image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)
# Save the image to a file
with open("utter_bliss.png", "wb") as f:
f.write(image_bytes)
Output:
gpt-image-1 offers a number of image editing options. The image edits endpoint lets us:
Let’s try editing an image using a mask. We’ll upload an image and provide a mask to specify which parts of it should be edited.
The transparent areas of the mask will be replaced based on the prompt, while the coloured areas will remain unchanged.
Now, let me ask the model to add Elon Musk to my uploaded image.
Input Code:
from openai import OpenAI
client = OpenAI()
result = client.images.edit(
model="gpt-image-1",
image=open("/content/analytics_vidhya_1024.png", "rb"),
mask=open("/content/mask_alpha_1024.png", "rb"),
prompt="Elon Musk standing in front of Company Logo"
)
image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)
# Save the image to a file
with open("Elon_AV.png", "wb") as f:
f.write(image_bytes)
Output:
from PIL import Image
from io import BytesIO
# 1. Load your black & white mask as a grayscale image
mask = Image.open("/content/analytics_vidhya_masked.jpeg").convert("L")
# 2. Convert it to RGBA so it has space for an alpha channel
mask_rgba = mask.convert("RGBA")
# 3. Then use the mask itself to fill that alpha channel
mask_rgba.putalpha(mask)
# 4. Convert the mask into bytes
buf = BytesIO()
mask_rgba.save(buf, format="PNG")
mask_bytes = buf.getvalue()
# 5. Save the resulting file
img_path_mask_alpha = "mask_alpha.png"
with open(img_path_mask_alpha, "wb") as f:
f.write(mask_bytes)
Here are some tips and best practices to follow while using gpt-image-1 for generating or editing images.
From creative designing and e-commerce to education, enterprise software, and gaming, gpt-image-1 has a wide range of applications.
The GPT-4o Image model is a powerful and versatile tool for image generation, but there are still a few limitations to keep in mind:
Here’s how OpenAI’s gpt-image-1 compares with the popular DALL·E models:
Model | Endpoints | Features |
DALL·E 2 | Generations, Edits, Variations | Lower cost, supports concurrent requests, includes inpainting capability |
DALL·E 3 | Generations only | Higher resolution and better image quality than DALL·E 2 |
gpt-image-1 | Generations, Edits (Responses API coming soon) | Excellent instruction-following, detailed edits, real-world awareness |
OpenAI’s gpt-image-1 showcases powerful image generation capabilities with support for creation, editing, and variations – all from simple textual prompts. With built-in customization options for size, quality, format, etc. and even inpainting capabilities, gpt-image-1 offers developers complete and transparent control over the desired output. While some might worry that this technology could replace human creativity, it’s important to note that such tools aim to enhance human creativity and be helpful tools for artists. We must find the right balance where such tools help us innovate without taking away the value of authentic, human-made work.