Ruchi Awasthi is a seasoned Machine Learning Scientist at Pure Storage, where she works in the GenAI R&D team within the CTO Office, building scalable Generative AI products. She holds a Bachelor’s degree from IIT Roorkee and has published research in Biomedical Signal Processing and Control on attention-based deep learning for skin lesion segmentation. Previously, Ruchi was a Senior Data Scientist at Unacademy, leading efforts to deliver personalized recommendations to over 250,000 users daily. She has also held roles at JP Morgan Chase & Co., MakeMyTrip, and FlyNava, working on a range of data science problems across text, image, and statistical modeling.
Her diverse experience spans early-stage startups to large multinational firms, with projects in recommendation systems, ranking algorithms, and infrastructure migration. Beyond her industry impact, Ruchi actively mentors over 40,000 followers on Instagram, sharing insights and career guidance in data science and Generative AI. With a strong foundation in machine learning and hands-on experience in deploying AI at scale, Ruchi is a leading voice driving innovation in AI applications across domains.
Imagine a world where AI is as eco-friendly as it is intelligent. This session is for anyone who wants to make artificial intelligence more practical and less expensive. As the computational demands of Large Language Models (LLMs) continue to grow, their deployment challenges in terms of cost, energy consumption, and hardware requirements become increasingly significant. This session aims to address these challenges by exploring a range of effective model compression techniques that reduce the size and computational overhead of LLMs without compromising their performance.
In this presentation, we will touch base the following High-Level Concepts of LLM Compression
1. Pruning: Technique to remove redundant or less important parameters from the model.
2. Knowledge Distillation: Training a smaller model (student) to replicate the behavior of a larger model (teacher).
3. Low-Rank Factorization: Decomposing large weight matrices into products of smaller matrices, reducing the number of parameters and computations.
4. Quantization: Reducing the precision of the model parameters.
Join us to explore simple, effective ways to reduce the size of these models using techniques like pruning, quantization, knowledge distillation, and low-rank factorization. We'll break down each method in easy-to-understand terms and infographics, explaining what these techniques do, why they are beneficial, what are different categories under each one of them and how they can be applied in real-life scenarios.
Read MoreManaging and scaling ML workloads have never been a bigger challenge in the past. Data scientists are looking for collaboration, building, training, and re-iterating thousands of AI experiments. On the flip side ML engineers are looking for distributed training, artifact management, and automated deployment for high performance
Read MoreManaging and scaling ML workloads have never been a bigger challenge in the past. Data scientists are looking for collaboration, building, training, and re-iterating thousands of AI experiments. On the flip side ML engineers are looking for distributed training, artifact management, and automated deployment for high performance
Read More