This session aims to provide an in-depth exploration of the inner workings of ChatGPT like models and the development of a GPT model from scratch.
The focus will be on comprehending the architecture of transformer models, which form the basis of GPT models
Through practical examples, a small character-based language model will be trained, allowing participants to gain a good understanding of these models.
The session will systematically define and dissect the components of the transformer model, including tokenization, encoder, decoder, self-attention, multi-head self-attention, and fine-tuning.
By delving into these aspects, attendees will develop a scientific understanding of the intricate mechanisms and concepts behind transformer-based models.
The knowledge gained from this hack session will empower participants to comprehend the underlying principles of ChatGPT and similar models, paving the way for further exploration and potential advancements in natural language processing research.
Key takeaways :