Black Box to Production: Measure, Test & Automate AI Agent Reliability with ADK

  • Aug 08, 2026
  • 09:30AM – 05:30PM

About the Workshop

Building AI agents is easy — building reliable AI agents is the hard part. This hands-on workshop is designed for data scientists, ML engineers, and AI practitioners who want to learn how to properly evaluate, test, debug, and deploy production-ready AI agents. 

Throughout the workshop, participants will work with a realistic customer-service AI agent using Google’s Agent Development Kit (ADK) to understand how to inspect agent reasoning, validate decision-making, detect failures, and automate regression testing workflows. Rather than evaluating only the final response, the workshop focuses on measuring the full sequence of actions, tool usage, and reasoning steps taken by an agent. 

By the end of the session, participants will build a complete automated evaluation pipeline — including verified test cases, bulk regression testing, debugging workflows, and deployment checks that prevent broken agent logic from reaching production. 

The workshop is heavily hands-on, with live coding exercises, realistic datasets, and end-to-end workflows that participants can directly apply to their own AI agents after the session.

Key Learning Outcomes 

  • Understand how to evaluate AI agents beyond final answers  
  • Build structured test cases from real agent interactions  
  • Run automated regression tests and debug failures  
  • Detect common production agent failure modes  
  • Set up deployment checks for reliable agent releases 

Prerequisites

  • Intermediate Python is required.

  • A basic understanding of how AI language models work is assumed.