“Leveraging compute over 175 billion or trillion parameter generative LLMs like GPT4 or ChatGPT require extensive distributed partitioning of tensors, models and pipelines for different acceleration frameworks. Frameworks such as Pytorch or Tensorflow leverage some of the basic features , which are insufficient to train larger models efficiently . In this session, we will see how acceleration frameworks such as IPEX or Deepspeed can modulate tensor slicing, transformer architecture partitioning across multiple GPU cards through different mechanisms along with distributed data parallelism to efficiently train models like ChatGPT. Also we will see the variation in distributed parallelism for GPU clusters in case of RLHF induced Generative LLMs.
Key Takeaways: