Demystifying GPT Models: Building Transformers from Scratch

This session aims to provide an in-depth exploration of the inner workings of ChatGPT like models and the development of a GPT model from scratch.

The focus will be on comprehending the architecture of transformer models, which form the basis of GPT models

Through practical examples, a small character-based language model will be trained, allowing participants to gain a good understanding of these models.

The session will systematically define and dissect the components of the transformer model, including tokenization, encoder, decoder, self-attention, multi-head self-attention, and fine-tuning.

By delving into these aspects, attendees will develop a scientific understanding of the intricate mechanisms and concepts behind transformer-based models.

The knowledge gained from this hack session will empower participants to comprehend the underlying principles of ChatGPT and similar models, paving the way for further exploration and potential advancements in natural language processing research.

Key takeaways :

Gain an in-depth understanding of transformer models.
Learn to develop a GPT model from scratch, unraveling its inner workings.
Comprehend the architecture of transformers and their significance in natural language processing.
Train a small character-based language model to grasp the functioning of transformer models.
Explore tokenization, encoder, decoder, self-attention, and multi-head self-attention components of transformers.
Understand the concept of fine-tuning and its role in optimizing transformer models.
Acquire foundational knowledge to delve further into research and advancements in natural language processing.

Buy Tickets

Anand Mishra

CTO

Generative AI

Demystifying GPT Models: Building Transformers from Scratch

Anand Mishra