Imagine an AI application that processes your voice, analyzes the camera feed, and engages in real-time human-like conversations. Until recently, to create such a tech-intensive multimodal application, engineers struggled with the complexities of asynchronous operations, juggling multiple API calls, and piecing together code that later proved to be difficult to maintain or debug. Steps in – GenAI Processors.
The revolutionary open-source Python library from Google DeepMind has forged new paths for developers interested in AI Applications. This library turns the chaotic landscape of AI development into a serene environment for developers. In this blog, we are going to learn how GenAI processors make complex AI workflows more accessible, which in turn will help us build a live AI Agent.
GenAI Processors is a new open-source Python library developed by DeepMind to provide structure and simplicity to the development challenges. They act as an abstraction that defines a common processor interface from input handling, pre-processing, actual model calls, and even output processing.
Imagine GenAI Processors as the common language between AI workflows. Rather than having to write custom code from scratch for every component in your AI pipeline, you simply work with standardized “Processor” units that are easy to combine, test, and maintain. At its core, GenAI Processors views all input and output as an asynchronous stream of ProcessorParts (bidirectional streaming). Standardized data parts flow through the pipeline (e.g., audio chunks, text transcriptions, image frames) with accompanying metadata.

The Key concepts here in GenAI Processors are:

Getting started with GenAI Processors is pretty straightforward:
1. Install the library:
pip install genai-processors
2. Setting up for Authentication:
# For Google AI Studio
export GOOGLE_API_KEY="your-api-key"
# Or for Google Cloud
gcloud auth application-default login
3. Checking the Installation:
import genai_processors
print(genai_processors.__version__)
4. Development Setup (Optional)
# Clone for examples or contributions
git clone https://github.com/google-gemini/genai-processors.git
cd genai-processors
pip install -e
GenAI Processors exist by means of a stream-based processing mode, whereby data flows along a pipeline of connected processors. Each processor:
Audio Input → Speech to Text → LLM Processing → Text to Speech → Audio Output
↓ ↓ ↓ ↓ ↓
ProcessorPart → ProcessorPart → ProcessorPart → ProcessorPart → ProcessorPart
The core components of GenAI Processors are:
1. Input Processors
2. Processing Processors
3. Output Processors
First of all, GenAI Processors have been designed to maximize concurrent execution of a Processor. Any part of this example execution flow may be run concurrently whenever all of its ancestors in the graph are computed. In other words, your application would essentially be processing multiple data streams concurrently, and accelerate response time and user experience.
So, let’s build a complete live AI agent that joins the camera and audio streams, sends them to the Gemini Live API for processing, and finally gets back audio responses.
Note: If you wish to learn all about AI agents, join our complete AI Agentic Pioneer program here.
This is how our Project structure would look:
live_agent/
├── main.py
├── config.py
└── requirements.txt
import os
from genai_processors.core import audio_io
# API configuration
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
raise ValueError("Please set GOOGLE_API_KEY environment variable")
# Audio configuration
AUDIO_CONFIG = audio_io.AudioConfig(
sample_rate=16000,
channels=1,
chunk_size=1024,
format="int16"
)
# Video configuration
VIDEO_CONFIG = {
"width": 640,
"height": 480,
"fps": 30
}
import asyncio
from genai_processors.core import (
audio_io,
live_model,
video,
streams
)
from config import AUDIO_CONFIG, VIDEO_CONFIG, GOOGLE_API_KEY
class LiveAgent:
def __init__(self):
self.setup_processors()
def setup_processors(self):
"""Initialize all processors for the live agent"""
# Input processor: combines camera and microphone
self.input_processor = (
video.VideoIn(
device_id=0,
width=VIDEO_CONFIG["width"],
height=VIDEO_CONFIG["height"],
fps=VIDEO_CONFIG["fps"]
) +
audio_io.PyAudioIn(
config=AUDIO_CONFIG,
device_index=None # Use default microphone
)
)
# Gemini Live API processor
self.live_processor = live_model.LiveProcessor(
api_key=GOOGLE_API_KEY,
model_name="gemini-2.0-flash-exp",
system_instruction="You are a helpful AI assistant. Respond naturally to user interactions."
)
# Output processor: handles audio playback with interruption support
self.output_processor = audio_io.PyAudioOut(
config=AUDIO_CONFIG,
device_index=None, # Use default speaker
enable_interruption=True
)
# Complete agent pipeline
self.agent = (
self.input_processor +
self.live_processor +
self.output_processor
)
async def run(self):
"""Start the live agent"""
print("🤖 Live Agent starting...")
print("🎥 Camera and microphone active")
print("🔊 Audio output ready")
print("💬 Start speaking to interact!")
print("Press Ctrl+C to stop")
try:
async for part in self.agent(streams.endless_stream()):
# Process different types of output
if part.part_type == "text":
print(f"🤖 AI: {part.text}")
elif part.part_type == "audio":
print(f"🔊 Audio chunk: {len(part.audio_data)} bytes")
elif part.part_type == "video":
print(f"🎥 Video frame: {part.width}x{part.height}")
elif part.part_type == "metadata":
print(f"📊 Metadata: {part.metadata}")
except KeyboardInterrupt:
print("\n👋 Live Agent stopping...")
except Exception as e:
print(f"❌ Error: {e}")
# Advanced agent with custom processing
class CustomLiveAgent(LiveAgent):
def __init__(self):
super().__init__()
self.conversation_history = []
self.user_emotions = []
def setup_processors(self):
"""Enhanced setup with custom processors"""
from genai_processors.core import (
speech_to_text,
text_to_speech,
genai_model,
realtime
)
# Custom input processing with STT
self.input_processor = (
audio_io.PyAudioIn(config=AUDIO_CONFIG) +
speech_to_text.SpeechToText(
language="en-US",
interim_results=True
)
)
# Custom model with conversation memory
self.genai_processor = genai_model.GenaiModel(
api_key=GOOGLE_API_KEY,
model_name="gemini-pro",
system_instruction="""You are an empathetic AI assistant.
Remember our conversation history and respond with emotional intelligence.
If the user seems upset, be supportive. If they're excited, share their enthusiasm."""
)
# Custom TTS with emotion
self.tts_processor = text_to_speech.TextToSpeech(
voice_name="en-US-Neural2-J",
speaking_rate=1.0,
pitch=0.0
)
# Audio rate limiting for smooth playback
self.rate_limiter = audio_io.RateLimitAudio(
sample_rate=AUDIO_CONFIG.sample_rate
)
# Complete custom pipeline
self.agent = (
self.input_processor +
realtime.LiveModelProcessor(
turn_processor=self.genai_processor + self.tts_processor + self.rate_limiter
) +
audio_io.PyAudioOut(config=AUDIO_CONFIG)
)
if __name__ == "__main__":
# Choose your agent type
agent_type = input("Choose agent type (1: Simple, 2: Custom): ")
if agent_type == "2":
agent = CustomLiveAgent()
else:
agent = LiveAgent()
# Run the agent
asyncio.run(agent.run())
Let’s add emotion detection and response customization
class EmotionAwareLiveAgent(LiveAgent):
def __init__(self):
super().__init__()
self.emotion_history = []
async def process_with_emotion(self, text_input):
"""Process input with emotion awareness"""
# Simple emotion detection (in practice, use more sophisticated methods)
emotions = {
"happy": ["great", "awesome", "fantastic", "wonderful"],
"sad": ["sad", "disappointed", "down", "upset"],
"excited": ["amazing", "incredible", "wow", "fantastic"],
"confused": ["confused", "don't understand", "what", "how"]
}
detected_emotion = "neutral"
for emotion, keywords in emotions.items():
if any(keyword in text_input.lower() for keyword in keywords):
detected_emotion = emotion
break
self.emotion_history.append(detected_emotion)
return detected_emotion
def get_emotional_response_style(self, emotion):
"""Customize response based on detected emotion"""
styles = {
"happy": "Respond with enthusiasm and positivity!",
"sad": "Respond with empathy and support. Offer help.",
"excited": "Match their excitement! Use energetic language.",
"confused": "Be patient and explanatory. Break down complex ideas.",
"neutral": "Respond naturally and helpfully."
}
return styles.get(emotion, styles["neutral"])
requirements.txt
genai-processors>=0.1.0
google-generativeai>=0.3.0
pyaudio>=0.2.11
opencv-python>=4.5.0
asyncio>=3.4.3
Commands to run the agent:
pip install -r requirements.txt
python main.py
GenAI Processors signifies a paradigm shift in developing AI applications, turning complex and disconnected workflows into elegant and maintainable solutions. Through a common interface with which to conduct multimodal AI processing, developers can innovate features instead of dealing with the infrastructure complications.
Hence, if streaming, multimodal, and responsive is the future for AI applications, then GenAI Processors can provide that today. If you want to build the next big customer service bots or educational assistants, or creative tools, GenAI Processors is your base for success.
GenAI Processors is completely open-source and free to use. However, you’ll incur costs for the underlying AI services you integrate with, such as Google’s Gemini API, speech-to-text services, or cloud computing resources. These costs depend on your usage volume and the specific services you choose to integrate into your processors.
Yes, while GenAI Processors is optimized for Google’s AI ecosystem, its modular architecture allows integration with other AI providers. You can create custom processors that work with OpenAI, Anthropic, or any other AI service by implementing the processor interface, though you may need to handle additional configuration and API management yourself.
You need Python 3.8+, sufficient RAM for your specific use case (minimum 4GB recommended for basic applications, 8GB+ for video processing), and a stable internet connection for API calls. For real-time video processing, a dedicated GPU can significantly improve performance, though it’s not strictly required for all use cases.
GenAI Processors processes data according to your configuration – you control where data is sent and stored. When using cloud AI services, data privacy depends on your chosen provider’s policies. For sensitive applications, you can implement local processing or use on-premises AI models, though this may require additional setup and custom processor development.
Absolutely! GenAI Processors is designed for production use with its asynchronous architecture and efficient resource management. However, you’ll need to consider factors like error handling, monitoring, scaling, and rate limiting based on your specific requirements. The library provides building blocks, but production deployment requires additional infrastructure considerations like load balancing and monitoring systems.