DataHour: Data Curation and Reliability for LLM and GenAI Applications

DataHour: Data Curation and Reliability for LLM and GenAI Applications

08 Mar 202415:03pm - 08 Mar 202416:03pm

DataHour: Data Curation and Reliability for LLM and GenAI Applications

About the Event

In this talk, Curtis Northcutt (MIT PhD and CEO of Cleanlab) explains how to systematically engineer data to build better AI systems using a growing AI approach called "Confident Learning," which leverages industry-usable novel AI approaches to automatically quantify, find, and fix errors in data and provide clean data for improved analytics and more reliable ML models. This approach is used today to significantly reduce the time and cost required to build more reliable AI and GenAI systems by automating data curation and adding reliability to every datapoint fed into and out of AI models.

Whereas typical machine learning talks focus on model improvements or prompt engineering techniques, in real-world applications, 80% of the time/cost to improve AI systems is spent on improving the data fed into those systems. Here you will learn how confident learning is used to automatically curate data fed into LLMs to improve the reliability and performance of fine-tuning OpenAI's GPT models by more than 30% test accuracy. You'll see how confident learning can be practically applied to your own datasets (text, image, video, tabular, audio, etc) and see how data errors are detected automatically in ten different popular datasets.

  1. Best articles get published on Analytics Vidhya’s Blog Space
  2. Best articles get published on Analytics Vidhya’s Blog Space
  3. Best articles get published on Analytics Vidhya’s Blog Space
  4. Best articles get published on Analytics Vidhya’s Blog Space
  5. Best articles get published on Analytics Vidhya’s Blog Space

Who is this DataHour for?

  1. Best articles get published on Analytics Vidhya’s Blog Space
  2. Best articles get published on Analytics Vidhya’s Blog Space
  3. Best articles get published on Analytics Vidhya’s Blog Space

About the Speaker

Curtis Northcutt

Curtis Northcutt

CEO & Co-Founder at Cleanlab

Curtis Northcutt is an American computer scientist, artist, and entrepreneur focusing on using machine learning and artificial intelligence to empower people. Curtis completed his PhD at MIT where he invented Cleanlab’s algorithms for automatically finding issues in most datasets. He is the CEO and Co-Founder of Cleanlab, used by 80+ of the top Fortune-500 companies to “reduce the time and cost to improve analytics, LLM, and machine learning model performance by automatically increasing the accuracy and dollar-value of every datapoint used in analytics/AI solutions.” Curtis is the recipient of the MIT Morris Levin Thesis Award, the NSF Fellowship, and the Goldwater Scholarship and has worked at several leading AI research groups, including Google, Oculus, Amazon, Facebook, Microsoft, and NASA.

Participate in discussion

Registration Details

5348

Registered

Become a Speaker

Share your vision, inspire change, and leave a mark on the industry. We're calling for innovators and thought leaders to speak at our event

  • Professional Exposure
  • Networking Opportunities
  • Thought Leadership
  • Knowledge Exchange
  • Leading-Edge Insights
  • Community Contribution