Nano Course: Building Large Language Models for Code

  • IntermediateLevel

  • 807+Students Enrolled

  • 1 HrDuration

  • 4.7Average Rating

hero fold image

About this Course

  • Here, you will learn how to train LLMs for Code from Scratch, covering Training Data Curation, Data Preparation, Model Architecture, Training, and Evaluation Frameworks.
  • Explore each step in-depth, delving into the algorithms and techniques used to create StarCoder, a 15B code generation model trained on 80+ programming languages.
  • Understand and learn the best practices to train your own StarCoder on the data.

Learning Outcomes

LLM Training

Master training Large Language Models for code from the ground up.

StarCoder Insight

Gain a deep dive into the StarCoder development process.

Techniques Mastery

Learn techniques used in StarCoder through our Hands-On.

Essential Frameworks

Explore code LLM architecture and evaluation frameworks.

Who Should Enroll

  • Developers interested in training custom code Lsrge Language Modelss from scratch and doing a Hands-On practice.
  • Machine learning engineers and practitioners looking to gain expertise in code generation using large language models.
  • Researchers exploring Large Language Models applications and their usage in software development.

Course Curriculum

Learn the complete process of building Large Language Models for code, covering data curation, model training, fine-tuning, evaluation, and deployment strategies.

tools

  1. 1. Introduction

  2. 2. Agenda of the course

  3. 3. BigCode Community

  4. 4. Training LLMs for Code from Scratch: Training Data Curation

  5. 5. Training Data Formatting and Preprocessing

  6. 6. Model Architecture

  7. 7. BigCode Ecosystem

  8. 8. Training Frameworks

  9. 9. Model Evaluation

  10. 10. Tools and Descendants of StarCoder

Meet the instructor

Our instructor and mentors carry years of experience in data industry

company logo
Loubna Ben Allal

ML Engineer at Hugging Face

Loubna Ben Allal is a Machine Learning Engineer at Hugging Face. She has been working on LLMs for code. She is part of the core team of BigCode that released The Stack dataset, SantaCoder, and StarCoder models.

Get this Course Now

With this course you’ll get

  • 1 Hour

    Duration

  • Loubna Ben Allal

    Instructor

  • Intermediate

    Level

Certificate of completion

Earn a professional certificate upon course completion

  • Globally recognized certificate
  • Verifiable online credential
  • Enhances professional credibility
certificate

Frequently Asked Questions

Looking for answers to other questions?

Foundational machine learning and deep learning knowledge is essential. Proficiency in Python and software development practices are also crucial.

t's an intermediate-level course, so beginners should establish foundational knowledge beforehand. It requires prior experience with ML and coding.

Yes, you'll learn practical insights and best practices for training similar models. The training pipeline, from data curation through model evaluation, will be thoroughly covered.

The training process encompasses data curation, preprocessing, model architecture, and evaluation. Specific techniques and frameworks used in StarCoder's development are explored in detail.

The core instructional content spans 1 hour, yet additional time for practice and exploration is recommended. Learning is self-paced, allowing you to delve into topics as needed.

Yes, you will receive a certificate of completion after successfully finishing the course and assessments.

Popular free courses

Discover our most popular courses to boost your skills

Card cap

30 Minutes 6 Lessons 6

Demystifying OpenAI Agents SDK

4.7
Card cap

15 Minutes 7 Lessons 7

Tableau for Beginners

4.7
Card cap

1 Hour2 Lessons 2

GenAI Landscape

4.6
Card cap

2 Hours2 Lessons 2

Getting Started with Tableau

4.5
Card cap

1 Hour 30 Minutes 3 Lessons 3

Getting Started with OpenAI o3-mini

4.8
Card cap

1 Hour 20 Minutes 1 Lesson1

Understanding Linear Regression

4.7
Card cap

1 Hour20 Lessons 20

Introduction to Python

4.9
Card cap

38 Minutes 1 Lesson1

Nano Course Cutting Edge LLM Tricks

4.6
Card cap

20 Minutes 6 Lessons 6

xAI Grok 3: Smartest AI on Earth

4.5
Card cap

1 Hour1 Lesson1

Deep Dive Into QwQ-32B

4.8
Card cap

30 Minutes 2 Lessons 2

Naive Bayes from Scratch

4.5
Card cap

19 Minutes 1 Lesson1

Introduction to Data Visualization

4.9
Card cap

1 Hour 30 Minutes 9 Lessons 9

Fundamentals of Regression Analysis

4.9
Card cap

30 Minutes 4 Lessons 4

Time Series Forecasting using Python

4.7
Card cap

34 Minutes 2 Lessons 2

Getting Started with DeepSeek-AI

4.9
Card cap

1 Hour6 Lessons 6

Generative AI on AWS

4.7
Card cap

1 Hour1 Lesson1

Exploring Stability. AI

4.9
Card cap

1 Hour 10 Minutes 2 Lessons 2

Building Text Classification Models in NLP

4.8
Card cap

9 Hours 30 Minutes 5 Lessons 5

Building Data Stories using Excel and Tableau

4.7
Card cap

1 Hour3 Lessons 3

Introduction to AI & ML

4.9
Card cap

1 Hour 20 Minutes 6 Lessons 6

Getting Started With Large Language Models

4.6
Card cap

40 Minutes 1 Lesson1

Guide to Vibe Coding in Windsurf

4.5
Card cap

1 Hour1 Lesson1

Introduction to Cloud

4.7
Card cap

1 Hour1 Lesson1

DeepSeek from Scratch

4.6
Card cap

30 Minutes 1 Lesson1

Big Mart Sales Prediction Using R

4.6
Card cap

2 Hours1 Lesson1

A Complete MLops Journey

4.6
Card cap

4 Hours3 Lessons 3

Generative AI - A Way of Life

4.5

Contact Us Today

Take the first step towards a future of innovation & excellence with Analytics Vidhya

Unlock Your AI & ML Potential

Get Expert Guidance

Need Support? We’ve Got Your Back Anytime!

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details