Efficient Deployment of GPT/T5 Models: Leveraging Faster Transformer and Load Balancing Techniques

In this session, we will explore the intricacies of deploying large AI models like GPT-3 and T5 in production. Key areas of focus will include the use of Faster Transformers for improved performance, load balancing for evenly distributed computational and memory load, and various optimization techniques for speed and memory efficiency. We will also discuss best practices for effective and efficient inference. This session promises practical insights and skills for data scientists, machine learning engineers, and AI enthusiasts alike

Buy Tickets

Saurav Agarwal

Solutions Architecture and Engineering Manager

Generative AI

Efficient Deployment of GPT/T5 Models: Leveraging Faster Transformer and Load Balancing Techniques

Saurav Agarwal