By applying specific modern state-of-the-art techniques, stable diffusion models make it possible to generate images and audio. Stable Diffusion works by modifying input data with the guide of text input and generating new creative output data. In this article, we will see how to generate new images from a given input image by employing depth-to-depth model diffusers on the PyTorch backend with a Hugging Face pipeline. We are using Hugging Face since they have made an easy-to-use image generation using stable diffusion pipeline available.
Learn More: Hugging Face Transformers Pipeline Functions
This article was published as a part of the Data Science Blogathon.
Stable Diffusion models function as latent diffusion models. It learns the latent structure of input by modeling how the data attributes diffuse through the latent space. They belong to the deep generative neural network. It is considered stable because we guide the results using original images, text, etc. On the other hand, an unstable diffusion will be unpredictable.
Stable Diffusion uses the Diffusion or latent image generation using stable diffusion model (LDM), a probabilistic model. These models are trained like other deep learning models. Still, the objective here is removing the need for continuous applications of signal processing denoting a kind of noise in the signals in which the probability density function equals the normal distribution. We refer to this as the Gaussian noise applied to the training images. We achieve this through a sequence of denoising autoencoders (DAE). DAEs contribute by changing the reconstruction criterion. This is what alters the continuous application of signal processing. It is initialized to add a noise process to the standard autoencoder.
In a more detailed explanation, Stable Diffusion consists of 3 essential parts: First is the variational autoencoder (VAE) which, in simple terms, is an artificial neural network that performs as probabilistic graphical models. Next is the U-Net block. This convolutional neural network (CNN) was developed for image segmentation. Lastly is the text encoder part. A trained CLIP ViT-L/14 text encoder deals with this. It handles the transformations of the text prompts into an embedding space.
The VAE encoder compresses the image pixel space values into a smaller dimensional latent space to carry out image diffusion. This helps the image not to lose details. It is represented again in pixeled pictures.
Let us quickly look at three common areas where diffusion models can be applied:
Applying diffusers can help generate free images that are plagiarism free. This provides content for your projects, materials, and even marketing brands. Instead of hiring a painter or photographer, you can generate your images. Instead of a voice-over artist, you can create your unique audio. Now let’s look at Image-to-image Generation.
Also Read: Bring Doodles to Life: Meta Open-Sources AI Model
This task requires GPU and a good development environment like processing images and graphics. You are expected to ensure you have GPU available if you want to follow along with this project. We can use Google Colab since it provides a suitable environment and GPU, and you can search for it online. Follow the steps below to engage the available GPU:
You can find all the code on GitHub.
There are several dependencies in using the pipeline from Huggingface. We will first start by importing them into our project environment.
Some libraries are not preinstalled in Colab. We need to start by installing them before importing from them.
# Installing required libraries
%pip install --quiet --upgrade diffusers transformers scipy ftfy
# Installing required libraries
%pip install --quiet --upgrade accelerate
Let us explain the installations we have done above. Firstly are the diffusers, transformers, scipy, and ftfy. SciPy and ftfy are standard Python libraries we employ for everyday Python tasks. We will explain the new major libraries below.
Diffusers: Diffusers is a library made available by Hugging Face for getting well-trained image to image stable diffusion models for generating images. We are going to use it for accessing our pipeline and other packages.
Transformers: Transformers contain tools and APIs that help us cut training costs from scratch.
# Backend
import torch
# Internet access
import requests
# Regular Python library for Image processing
from PIL import Image
# Hugging face pipeline
from diffusers import StableDiffusionDepth2ImgPipeline
StableDiffusionDepth2ImgPipeline is the library that reduces our code. All we need to do is pass an image and a prompt describing our expectations.
Next, we just make an instance of the pre-trained diffuser we imported above and assign it to our GPU. Here this is Cuda.
# Creating a variable instance of the pipeline
pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-depth",
torch_dtype=torch.float16,
)
# Assigning to GPU
pipe.to("cuda")
Let’s define a function to help us check images from URLs. You can skip this step to try an image you have locally. Mount the drive in Colab.
# Accesssing images from the web
import urllib.parse as parse
import os
import requests
# Verify URL
def check_url(string):
try:
result = parse.urlparse(string)
return all([result.scheme, result.netloc, result.path])
except:
return False
We can define another function to use the check_url function for loading an image.
# Load an image
def load_image(image_path):
if check_url(image_path):
return Image.open(requests.get(image_path, stream=True).raw)
elif os.path.exists(image_path):
return Image.open(image_path)
Now, we need an image to diffuse into another image. You can use your photo. In this example, we are using an online image for convenience. Feel free to use your URL or images.
# Loading an image URL
img = load_image("https://img.freepik.com/free-photo/stacked-tomatoes_1353-262.jpg?w=740&t=st=1683821147~exp=1683821747~hmac=708f16371d1e158d76c8ea5e8b9790fb68dc75009750b8328e17c21f16d36468")
# Displaying the Image
img
Now we have a usable image. Let’s now show some image to image stable diffusion feats on it. To achieve this, we wrap prompts to the pictures. These are sets of texts with keywords describing our expectations from the Diffusion. Instead of generating a random new image, we can use prompts to guide the model’s output.
Note that we set the strength to 0.7. This is an average. Also, note the negative_prompt is set to None. We will look at this more later.
# Setting Image prompt
prompt = "Some sliced tomatoes mixed"
# Assigning to pipeline
pipe(prompt=prompt, image=img, negative_prompt=None, strength=0.7).images[0]
Now we can continue with this step on new images. The method remains;
Loading the image to be diffused, and
Creating a text description of the target image.
You can create some examples on your own.
Another approach is to create a negative prompt to counter the intended output. This makes the pipeline more flexible. We can do this by assigning a negative prompt to the negative_prompt variable.
# Loading an image URL
img = load_image("https://img.freepik.com/free-photo/stacked-tomatoes_1353-262.jpg?w=740&t=st=1683821147~exp=1683821747~hmac=708f16371d1e158d76c8ea5e8b9790fb68dc75009750b8328e17c21f16d36468")
# Displaying the Image
img
# Setting Image prompt
prompt = ""
n_prompt = "rot, bad, decayed, wrinkled"
# Assigning to pipeline
pipe(prompt=prompt, image=img, negative_prompt=n_prompt, strength=0.7).images[0]
You may ask about altering how much the new image changes from the first. We can achieve this by changing the strength level. We will observe the effect of different strength levels on the previous image.
At strength = 0.1
# Setting Image prompt
prompt = ""
n_prompt = "rot, bad, decayed, wrinkled"
# Assigning to pipeline
pipe(prompt=prompt, image=img, negative_prompt=n_prompt, strength=0.1).images[0]
On strength = 0.4
# Setting Image prompt
prompt = ""
n_prompt = "rot, bad, decayed, wrinkled"
# Assigning to pipeline
pipe(prompt=prompt, image=img, negative_prompt=n_prompt, strength=0.4).images[0]
At strength = 1.0
# Setting Image prompt
prompt = ""
n_prompt = "rot, bad,decayed, wrinkled"
# Assigning to pipeline
pipe(prompt=prompt, image=img, negative_prompt=n_prompt, strength=1.0).images[0]
The strength variable makes it possible to work on the effect of Diffusion on the new image generated. This makes it more flexible and adjustable.
Before we call it a wrap on Stable Diffusion, one must understand that one can face some limitations and challenges with these pipelines. Every new technology always has some issues at first.
In conclusion, while the concept of diffusers is cutting-edge, the Hugging Face pipeline makes it easy to integrate into our projects with an easy and very direct code underside. Using prompts on the images makes it possible to set and bring an imaginary picture to the Diffusion. Additionally, the strength variable is another critical parameter. It helps us with the level of Diffusion. We have seen how to generate new images from images.
Key Takeaways
Learn More: Pytorch | Getting Started With Pytorch
Master image generation with our Stable Diffusion with Hugging Face course. Learn to create stunning images from text prompts and input images with ease.
Reference Links
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
A. Stable Diffusion allows users to generate high-quality images by iteratively refining them through diffusion processes. This technique enhances image quality and realism over time, making it suitable for various creative and artistic applications.
A. Yes, Stable Diffusion is open-source and available for free. Users can access and utilize the model without any cost, facilitating experimentation and development in the field of image generation and enhancement.
A. Yes, Stable Diffusion can generate NSFW (Not Safe For Work) content as it allows users to control and manipulate image generation processes. However, ethical considerations and guidelines should be followed when creating such content.
A. To begin working with Stable Diffusion, you can install the necessary libraries and dependencies, such as PyTorch and Stable Diffusion framework. Next, explore tutorials and documentation available online to understand its functionalities and start experimenting with image generation tasks.