How to Make Text-to-Image Conversion Faster with SDXL Turbo?

Ajay Kumar Reddy 30 Jan, 2024 • 8 min read

Introduction

Stability AI has been at the forefront of developing Open Source Diffusion Models like the Stable Diffusion and Stable Diffusion XL, which have brought a revolution to the field of text-to-image generation. Now, the world of text-to-image generation just got a major upgrade with the arrival of Stable Diffusion XL Turbo aka SDXL Turbo for short. This revolutionary model from Stability AI promises lightning-fast image creation, pushing the boundaries to the next level. Stability AI has brought in a new concept with the introduction of this model. In this article, we will go through the process of setting up this model.

Learning Objectives

Understanding Stable Diffusion and Diffusion Models
Key features and benefits of SDXL Turbo
Learn to use SDXL Turbo for your image generation projects

This article was published as a part of the Data Science Blogathon.

What are Stable Diffusion / Diffusion models?
What is SDXL Turbo?
Getting Started with SDXL Turbo:
Controlling the Generated Images
Applications and Use Cases
Frequently Asked Questions

What are Stable Diffusion / Diffusion models?

Stable Diffusion is a powerful text-to-image generation model that uses diffusion processes to add noise at every step to an image while preserving its internal form. By starting with pure noise and eventually eliminating it based on a Text Prompt, the model “learns” to create images that match the textual description. Diffusion models like Stable Diffusion XL have a good number of plus points over older-generation methods, including high-quality image outputs, detailed control, and diverse artistic styles.

But the problem comes during the generation process. The time it takes to generate these high-quality images through Stable Diffusion / Stable Diffusion XL is pretty high and is the issue at hand. The diffusion model needs to take many iterations ranging from 20 to 60 to produce good-quality images. Hence a lot of research has been put up to reduce the generation speed and thus the Stability AI came up with SDXL Turbo.

SDXL Turbo | Diffusion models | stability AI

But the problem comes during the generation process. The time it takes to generate these high-quality images through Stable Diffusion / Stable Diffusion XL is pretty high and is the only drawback. The diffusion model needs to take many iterations ranging from 20 to 60 to produce good-quality images. Hence a lot of research has been put up to reduce the generation speed and thus the Stability AI came up with SDXL Turbo.

What is SDXL Turbo?

SDXL Turbo is a distilled version of Stable Diffusion XL, built using a novel method called Adversarial Diffusion Distillation (ADD). This method, what it does is, it “tunes” the model for faster inference, drastically reducing the image generation time. The Adversarial Diffusion Unlike traditional Stable Diffusion, which requires tens or hundreds of steps to produce a high-quality image, SDXL Turbo can get similar results in just five steps. It can even produce good-quality images only in a single iteration. This translates to real-time image generation, opening up a world of creative possibilities.

The Adversarial Diffusion Distillation involves three networks, an ADD-student, a Discriminator, and a DM-Teacher (Diffusion Model Teacher). Firstly, a real image is converted into a noisy image. The ADD-Student then takes in this noisy image and tries to generate a good-quality image in just 4 steps through the diffusion process, i.e. denoising it. The discriminator then tries to distinguish between the real image and the image produced by the student to check if it is fake or real.

In this process, the student tries to optimize two losses. One is the adversarial loss, that is it tries to fool the discriminator by generating good images that look like the original image. The other is the distillation loss, where the student tries to achieve results comparable to that of the DM-Teacher. Here the knowledge is being distilled from the DM-Teacher to the ADD-student, where the student keeps the denoised weight of the teacher as its prediction target to decrease the distillation loss. This way, the student can generate good-quality images in just a few steps.

Getting Started with SDXL Turbo:

In this section, we will look into how to get started with Stable Stable through huggingface. To get started, first, we download the necessary libraries.

!pip install diffusers transformers accelerate

The diffusers is a HuggingFace library for Python that lets us work with different diffusion models like the Stable Diffusion, Stable Diffusion XL, and Stable Diffusion XL Turbo.
The transformers library from HuggingFace is a helper library for diffusers
The accelerate library helps us load the model properly into the system RAM and GPU RAM when we are running on low system RAM

Download the Model

# Import the necessary libraries
from diffusers import AutoPipelineForText2Image 
import torch 

# Load the pre-trained text-to-image diffusion model
pipe = AutoPipelineForText2Image.from_pretrained(
   "stabilityai/sdxl-turbo",  # Specify the model name from Hugging Face Hub
   torch_dtype=torch.float16,  # Set half-precision floating-point type for efficiency
   variant="fp16"  # Optimize model for half-precision calculations
)

# Move the model to the GPU for faster processing
pipe.to("cuda")

In the above code, first, we import the AutoPipelineForText2Image class. This class is responsible for working with many diffusion models.
Then we create a pipeline object with the above class
Here we mention the sdxl-turbo model from the Stability AI, we even set the torch type to torch.float16 and the Variant to fp16
This will download the SDXL Turbo in the Floating Point 16 format
The final statement will load the stable diffusion XL turbo model to the GPU

Now we have successfully downloaded the model and have uploaded it into the GPU. Next, we will try giving a Prompt and observing the image generated.

# Define the prompt text for image generation
prompt = "A cinematic shot of a kitten walking down a lush green forest on \
a broad daylight"  

# Generate the image using the model
image = pipe(
    prompt=prompt,  # Pass the prompt to the model
    num_inference_steps=1,  # Set the number of diffusion steps 
    guidance_scale=0.0  # Disable guided diffusion for this generation
).images[0]  # Access the generated image
image

Here we provide the above Prompt. Then to generate the image, we pass the Prompt to the pipeline object
Along with the Prompt, we even pass in the number of inference steps to go through for generating the image, here we give the value of 1
SDXL Turbo does not use a guidance_scale, hence to disable it, we pass the value of 0.0 to it
The pipeline object will then produce an output of type StableDiffusionXLPipelineOutput, which contains the images. The images are of type list and the list contains our image
The image is of type PIL.Image.Image, which we can take a look at in the Jupyter Notebooks or can be saved in other code editors

The Generated Image

The image generated took only a single inference step. And the time taken to generate is less than a second. This takes the existing SD and SDXL models to the next level, which usually takes many seconds and sometimes even minutes to generate images. And even the generated image quality is good

Controlling the Generated Images

Sometimes, the image generated can be distorted or may not be of good quality. Think of the generated image containing a human with 3 eyes, 3 legs, or more than 5 fingers. This is not the proper image that we want to generate Hence for these, we give Negative Prompts. Here is an example of an unusual image-generated

Also by default, the images generated are of size 512×512. We can generate an image of higher resolution by giving the width and height of the pipeline. Now let’s try adding in Negative Prompts and even changing the resolution of the generated image

# Define prompts for image generation
prompt = "A cinematic close-up shot of astronauts walking stepping \
down the spacecraft on Mars."
negative_prompt = "blurry image, distorted image, people, triple hands"

# Generate the image using the model, incorporating desired features
image = pipe(
    prompt=prompt,  # Provide the main prompt to guide image creation
    negative_prompt=negative_prompt,  # Specify elements to avoid in the image
    num_inference_steps=4,  # Set the number of diffusion steps
    guidance_scale=0.0,  # Disable guided diffusion for this generation
    width=1024,  # Set image width to 1024 pixels
    height=1024  # Set image height to 1024 pixels
).images[0]  # Access the generated image

image

Here, we give the Prompt to generate an image of an astronaut walking on Mars
We even provide a negative Prompt, where we mention things like blurry images, distorted images, triple hands, etc.
Then we pass our Prompt and Negative Prompt to the pipeline
Here we are giving the number of inference steps equal to 4
We want to generate an image of size 1024×1024, hence we provide these to the height and width variables of the pipeline itself

The image generated for the above Prompt can be seen below

Compared to the 1st pic, here we don’t see distortions or an unusual number of body parts. The image perfectly follows the text we have provided to the SDXL Turbo

Applications and Use Cases

The potential applications and use cases of the SDXL Turbo are

Interactive Design & Editing: Unleash a Pixel Picasso

Gone are the days of painstaking changes in design software. With SDXL Turbo, crafting visuals can be done in real time. Imagine sketching out a concept for a science fiction film, and instantly conjuring shimmering alien cities or spaceships bursting with neon, guided by your every descriptive whim. We can leverage it to create vibrant posters by injecting it with a few phrases.

Rapid Prototyping: From Mindstorm to Mockup in Minutes

We can say goodbye to the frustration of slow, clunky prototyping tools. SDXL Turbo lets our ideas materialize at the speed of imagination. It helps in brainstorming. Writing down a few lines about its features, and watching the SDXL creating breathtaking and interactive mockups with realistic user interfaces.

It will be helpful in physical product designs. Simply describe the shape, materials, and functionality, and we can witness a virtual prototype materialize before our eyes, ready for immediate tweaks and changes. With SDXL Turbo, iteration cycles become lightning-fast, reducing our design time and propelling our projects from concept to reality in record time.

Live Presentations & Storytelling

Captivate our audience like never before with presentations that go beyond simple static slides. SDXL Turbo transforms our stories into living images, based on our Prompts and audience interaction. Think of a situation where we are telling a story and watching the scenes change with every word spoken by us. With SDXL Turbo, presentations become immersive journeys, leaving our listeners spellbound.

Conclusion

SDXL Turbo marks a thrilling evolution in the realm of text-to-image Generation, thus paving the way for Artists and Creators to materialize their visions with unprecedented speed. While it is still not close to the intricate detail of slower diffusion models, its real-time capabilities unlock a myriad of possibilities for rapid prototyping, collaborative exploration, and captivating live performances. In this article, we have taken a practical look at how to get started with SDXL Turbo

Key Takeaways

SDXL Turbo represents a new way forward in image generation space, bringing up real-time image creation from Text Prompts
The model’s speed and efficiency make it a good choice for rapid prototyping, iterative design, and live creative performances
Its availability through different platforms and tools democratizes AI art creation, opening up avenues for artists, designers, and anyone with an imagination to explore
Despite its limitations in image complexity compared to slower diffusion models, SDXL Turbo excels in generating diverse and inspiring visual concepts in near real-time

Frequently Asked Questions

Q1. What are Diffusion Models?

A. Diffusion models are text-to-image models that slowly add noise to images based on Text Prompts, preserving the form and producing high-quality outputs.

Q2. What led to the development of SDXL Turbo?

A. SDXL Turbo addresses slow image generation in diffusion models by using Adversarial Diffusion Distillation for real-time results.

Q3. How does SDXL Turbo achieve faster image generation?

A. SDXL Turbo is created through Adversarial Diffusion Distillation, thus allowing to use low number of steps like five steps for good results to traditional models.

Q4. Can SDXL Turbo generate Custom Image Sizes and Negative Prompts?

A. Yes, SDXL Turbo gives us the option to edit the Image Size and lets us provide Negative Prompts to avoid distortions or unwanted features in generated images.

Q5. From where can we download and try SDXL Turbo?

A. SDXL Turbo is readily available in Hugging Face. We can work with the existing diffusers library from HuggingFace and work with it to download the SDXL Turbo model.

Q6. What are the potential limitations of SDXL Turbo in image complexity?

A. SDXL Turbo may have limitations in handling highly complex image details compared to slower models. Even the quality of the image generated will be a bit less compared to the actual SDXL models.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Ajay Kumar Reddy 30 Jan 2024

Diffusion Models Intermediate Python Stable Diffusion

How to Make Text-to-Image Conversion Faster with SDXL Turbo?

Introduction

Learning Objectives

Table of contents

What are Stable Diffusion / Diffusion models?

What is SDXL Turbo?

Getting Started with SDXL Turbo:

Download the Model

The Generated Image

Controlling the Generated Images

Applications and Use Cases

Conclusion

Key Takeaways

Frequently Asked Questions

Frequently Asked Questions

Responses From Readers

Write for us