Mastering Language Models: From Concepts to Code in PyTorch

10 AUGUST 2024 | 09:30AM - 05:30PM

About the workshop

Right now, people all over the world are going bonkers over something called ChatGPT. In this workshop, we’ll learn the basic concepts behind how ChatGPT works and then learn how to code, train, and use our own version of it from scratch using PyTorch. From this coding experience, we’ll learn about the strengths and weaknesses of models like ChatGPT, as well as discuss alternative design strategies. Then we’ll learn how to fine-tune a production language model on a custom dataset. Fine-tuning on a custom dataset gives us more control over how the model behaves and can make it more reliable.

NOTE: This workshop will be done in “StatQuest Style” meaning every little detail will be clearly explained. We’ll also start each module with a silly song.

video thumbnail

Instructor

speaker image

Joshua Starmer PhD

Founder and CEO

company logo

Modules

In this module, we’ll cover the basic concepts of neural networks and transformers, which provide the backbone for ChatGPT-style language models. In this module we will discuss:

  • The basics of how neural networks can fit any shape to any dataset.
  • The basics of how neural networks are trained with backpropagation.
  • The basics of how transformers work, including:
    • Word embedding
    • Position Encoding
    • Attention

In this module, we’ll cover the essential matrix algebra that is required when coding neural networks in PyTorch. Specifically, we will discuss:

  • Matrix addition and multiplication.
  • Why matrix multiplication is so funky.
  • Matrix concepts that help us read PyTorch documentation and error messages.
  • A walkthrough of all the matrix math required to code a transformer.

In this module, we will code a ChatGPT-like language model from scratch. Specifically, we will:

  • Code Position Encoding.
  • Code Attention.
  • Code a Decoder-Only Transformer from scratch.
  • Train our model.
  • Use our model.

Training a large-scale language model from scratch is crazy expensive and takes a long time and pretty much nobody does it. Instead, what they take is a pre-trained model and fine-tune it to perform specific tasks. In this module, we’ll learn how to fine-tune a production grade large language model and do it ourselves with GPUs in the cloud. Specifically, we will:

  • Load and use a large language model in the cloud and run it on a GPU.
  • Fine-tune a large language model on a custom dataset.
  • Use the fine-tuned model.
*Note: These are tentative details and are subject to change.
Download Brochure

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details