Distributed Deep Learning Acceleration Framework optimization for Large Generative Models

“Leveraging compute over 175 billion or trillion parameter generative LLMs like GPT4 or ChatGPT require extensive distributed partitioning of tensors, models and pipelines for different acceleration frameworks. Frameworks such as Pytorch or Tensorflow leverage some of the basic features , which are insufficient to train larger models efficiently . In this session, we will see how acceleration frameworks such as IPEX or Deepspeed can modulate tensor slicing, transformer architecture partitioning across multiple GPU cards through different mechanisms along with distributed data parallelism to efficiently train models like ChatGPT. Also we will see the variation in distributed parallelism for GPU clusters in case of RLHF induced Generative LLMs.

Key Takeaways:

Importance of distributed partitioning in handling high-parameter generative LLMs like GPT4 or ChatGPT.
Limitations of PyTorch and TensorFlow in efficiently training large models.
Benefits of using advanced acceleration frameworks such as IPEX or DeepSpeed for efficient tensor slicing and transformer architecture partitioning.
Understanding the role of distributed data parallelism in enhancing model training.
Insights into variance in distributed parallelism for RLHF-induced Generative LLMs.”

Buy Tickets

Abhilash Majumder

Senior HPC/Deep Learning Engineer

Generative AI

Distributed Deep Learning Acceleration Framework optimization for Large Generative Models

Abhilash Majumder