Evaluating GenAI Models: Case Studies in Enterprise and Healthcare

About

Generative AI is driving the biggest platform shift since the advent of the internet, transforming every industry by reshaping customer service, software development, marketing, HR, and beyond. However, many organizations face a gap between GenAI’s promise and its actual performance. Unlike traditional ML, GenAI systems are harder to evaluate due to their subjective, multimodal, and human-in-the-loop nature. This session explores the critical need for robust GenAI evaluation frameworks across technical aspects (like prompt evaluation, red teaming, and reproducibility), observability (including production logging and cost monitoring), and business metrics (such as ROI, service improvements, and responsible AI measures).

We’ll contrast GenAI and traditional ML evaluation methods and introduce a holistic framework that includes ground truth creation via gold/silver datasets. Through real-world case studies in Enterprise and HealthTech—including recommender systems, auto form filling, de-identification, and structured note generation—we’ll show how to evaluate GenAI systems effectively both pre- and post-production. The session will highlight key tools and techniques that enhance GenAI evaluation usability, especially for complex tasks like summarization and compliance.

Speaker

Book Tickets
Download Brochure

Download agenda