We have all enjoyed comics at some point, be it superhero comic books, comics in newspapers, or manga from Japan. Comics are brief, expressive, and encapsulate storytelling within just a few frames. But what if there is a new twist: what if you could use a comic generator to turn a short video clip into a comic strip of 4 panels with speech bubbles, expressive caricatures, and humour?
This is the idea behind Comic Generator or Comic War, not just another content generator. Still, a system I designed that takes a video clip and a short, brief creative idea and turns it into a finished comic strip image. It’s best to think of it as an imaginative partnership between two minds: one “writing the screenplay” and the other “drawing the comic.”
In this article, I will guide you through the journey of Comic War, explaining how it works, what components are required, which programming language to use for coding, the challenges I encountered during the process, and where the project can go from here.
All creative applications hinge on a standard formula:
For Comic War, the formula looks like:
What makes this fun? Because it’s personalized. Instead of random comics, you will receive a reinterpretation of the very clip you just selected, tailored around your one-line idea.
Consider a fight scene in a movie, echoing a student morphed into a goofy classroom battle about homework. This concoction of relatable visuals – familiar usernames with a surprising, personalised comic rewrite twist – is what makes Comic War addictive.
The pipeline is deconstructed as follows:
The process begins with two simple inputs:
Example:
Video URL: https://www.youtube.com/shorts/xQPAegqvFVs
Idea: Instead of violence, replace it with exams, like Yash saying
“Violence, violence, I don’t like violence, I avoid… but violence likes me.”
This is all the user has to provide, no complex settings, no sliders.
The first part of the pipeline is what I refer to as the Storyteller. This is where the raw input of a YouTube video link and a brief idea you typed in gets transformed into something structured and usable.
When you paste a video URL, Gemini looks at the clip and extracts details:
Then it takes your one-liner (for example, “replace violence with exams”) and expands it into a comic script.
Now, this script isn’t just random text. It’s a screenplay for four panels that follows a strict set of rules. These rules were explicitly written into the system prompt that guides Gemini. They include:
By baking these constraints into the system prompt, I made sure the Storyteller always produces a clean, reliable screenplay. So instead of asking the image generator to “just make a comic,” Gemini prepares a fully structured plan that the next step can follow without guesswork.
Once the script is ready, it’s passed on to the Illustrator.
This part doesn’t have to interpret anything; its single responsibility is to draw exactly what the Storyteller described.
The Illustrator function is addressed by an image generation model. In my setup, I have OpenAI’s GPT-Image-1 as my first choice, and Google’s Imagen as a secondary fallback if the first tool fails.
Here is what it looks like in practice:
This separation is the key to making Comic War reliable.
That’s why the output doesn’t feel messy or random. Each comic comes out as a proper four-panel strip, styled like a meme, and matches your idea almost one-to-one
The result is a 4-panel comic strip image:
And best of all, it feels like a finished comic you could be published online.
Here’s what powers the system:
This dual approach ensures both creativity (from the storyteller) and visual consistency (from the illustrator).
Now, let’s look into the actual implementation.
@dataclass
class ComicGenerationConfig:
primary_service: str = "openai"
fallback_service: str = "imagen"
output_filename: str = "images/generated_comic.png"
openai_model: str = "gpt-image-1"
imagen_model: str = "imagen-4.0-generate-preview-06-06"
gemini_model: str = "gemini-2.0-flash"
Where the models have been used in the following manner:
def extract_comic_prompt_and_enhance(video_url, user_input):
response = gemini_client.models.generate_content(
model="gemini-2.0-flash",
contents=[
Part(text=enhancement_prompt),
Part(file_data={"file_uri": video_url, "mime_type": "video/mp4"})
]
)
return response.text
This step rewrites a vague input into a detailed comic prompt.
OpenAI (primary):
result = openai_client.images.generate(
model="gpt-image-1",
prompt=enhanced_prompt,
)
image_bytes = base64.b64decode(result.data[0].b64_json)
Imagen (fallback):
response = gemini_client.models.generate_images(
model="imagen-4.0-generate-preview-06-06",
prompt=enhanced_prompt,
)
image_data = response.generated_images[0].image
Fallback ensures reliability; if one illustrator fails, the other takes over.
def save_image(image_data, filename="generated_comic.png"):
img = PILImage.open(BytesIO(image_data))
img.save(filename)
return filename
This method writes the comic strip to disk in PNG format.
def generate_comic(video_url, user_input):
enhanced_prompt = extract_comic_prompt_and_enhance(video_url, user_input)
image_data = generate_image_with_fallback(enhanced_prompt)
return save_image(image_data)
All the steps tie together here:
Let’s see this in action.
Input:


Generated screenplay:
Output:

No project is without hurdles. Here are some I faced:
Comic War is just one use case. The same engine can power:
In short, anything that combines humor, visuals, and personalization could benefit from this approach.
Comic War started as one of our proposals during DHS, and it is something very personal to me. I worked with my colleagues, Mounish and Badri, and we spent hours thinking together, tossing ideas and concepts out there, rejecting ideas, and laughing at things we came up with, until we finally found an idea we thought we could really do anything with: “How about we take a short video and make a comic strip?”

We submitted our idea, incognizant of what would happen… and we were surprised when it got selected. Ultimately, we had to create it, each piece by piece. It entailed many long nights, lots of debugging, and plenty of excitement every time something ‘worked’ the way we wanted it to. Seeing our idea move from just an idea to something real was honestly one of the best feelings ever.
What we witnessed, when we let it loose, was well worth it, as all the responses were positive. People kept telling me it was great, and that they were intrigued by the idea and the process of how we arrived at the idea and then made it happen.

Perhaps the most surprising part for me was how people began to use it in ways I never considered. Parents began to make comics for their children, literally turning mundane little stories into something special and visual. Others started exploring and experimenting, thinking of the most amazing prompts and then seeing what happened next.
For me, that was the most exciting part, seeing people get excited about something we created and then go and create something even cooler, and to see this little idea moment turn into something like Comic War was amazing.
Building Comic War was a lesson in orchestration, splitting the job between a storyteller and an illustrator.
Instead of hoping a single model “figures everything out,” we gave each part a clear role:
The result is something that feels polished, personal, and fun.
And that’s the point: with just a short video and a silly idea, anyone can create a comic that looks like it belongs on the internet’s front page.
A. A YouTube Short link (~30–40 sec) and a one-line idea. The system analyzes the clip with Gemini, expands your idea into a 4-panel screenplay, and then the image model draws it.
A. Gemini drafts the 4-panel script. GPT-Image-1 draws it. If OpenAI fails, Imagen is used automatically. This separation keeps results consistent.
A. The screenplay removes brand and character names, avoids likenesses, and keeps a stylized comic look. You supply videos that you have the right to use.