{"id":1099,"date":"2023-04-27T13:20:34","date_gmt":"2023-04-27T13:20:34","guid":{"rendered":"https:\/\/www.analyticsvidhya.com\/datahack-summit-2023\/?page_id=1099"},"modified":"2023-08-11T18:31:38","modified_gmt":"2023-08-11T13:01:38","slug":"build-scalable-machine-learning-model","status":"publish","type":"page","link":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/build-scalable-machine-learning-model\/","title":{"rendered":"Build Scalable Machine Learning Models"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">In today&#8217;s data-driven world, the ability to build scalable machine learning models has become increasingly important. With the exponential growth of data, traditional machine learning approaches are often not sufficient to handle the large datasets that many organizations are dealing with. This is where Apache Spark comes in, providing a powerful distributed computing framework that allows you to build and train machine learning models at scale.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">During this workshop, you will gain hands-on experience using Spark ML in Apache Spark to build and test different machine learning models. You will learn about the unique challenges and opportunities that arise when working with big data, including data preparation, feature engineering, and model selection.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Workshop Highlights:<\/span><\/p>\n<h4><span style=\"font-weight: 400;\">Module 0: Introduction to Spark<\/span><\/h4>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Why do we need distributed systems?<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">What is Apache Spark?<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Understanding Spark Architecture<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Installing and setting up PySpark<\/span><\/li>\n<\/ul>\n<h4><span style=\"font-weight: 400;\">Module 1: Getting familiar with Spark<\/span><\/h4>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Understanding RDDs<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Learn to create RDDs and get familiar with RDD operations<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Handle structured data with Spark DataFrames<\/span><\/li>\n<\/ul>\n<h4><span style=\"font-weight: 400;\">Module 2: Brushing up ML<\/span><\/h4>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">What is Machine Learning<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Types of ML: Supervised, Unsupervised, Reinforcement<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Types of ML problems: Regression, Classification<\/span><\/li>\n<\/ul>\n<h4><span style=\"font-weight: 400;\">Module 3: Spark ML<\/span><\/h4>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Understanding the problem statement<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">EDA<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Encoding categorical variables<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Understanding Vector Assembler<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Model Building with Spark ML<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Evaluating Models<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Finetune models<\/span><\/li>\n<\/ul>\n<h4><span style=\"font-weight: 400;\">Module 4: Building ML Pipelines<\/span><\/h4>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Understand Transformers<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Understand Estimators<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">Build Pipelines in Spark ML<\/li>\n<\/ul>\n<p><b>Pre-requisites:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Laptop with minimum 8 GB of RAM<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Knowledge of Python<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Basic understanding of Machine Learning<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Note: These are tentative details and are subject to change.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In today&#8217;s data-driven world, the ability to build scalable machine learning models has become increasingly important. With the exponential growth of data, traditional machine learning approaches are often not sufficient to handle the large datasets that many organizations are dealing with. This is where Apache Spark comes in, providing a powerful distributed computing framework that [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1614,"parent":890,"menu_order":1,"comment_status":"closed","ping_status":"closed","template":"workshop-detail.php","meta":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Build Scalable Machine Learning Models - DataHack Summit 2023<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/build-scalable-machine-learning-model\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Build Scalable Machine Learning Models - DataHack Summit 2023\" \/>\n<meta property=\"og:description\" content=\"In today&#8217;s data-driven world, the ability to build scalable machine learning models has become increasingly important. With the exponential growth of data, traditional machine learning approaches are often not sufficient to handle the large datasets that many organizations are dealing with. This is where Apache Spark comes in, providing a powerful distributed computing framework that [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/build-scalable-machine-learning-model\/\" \/>\n<meta property=\"og:site_name\" content=\"DataHack Summit 2023\" \/>\n<meta property=\"article:modified_time\" content=\"2023-08-11T13:01:38+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-content\/uploads\/2023\/06\/wbuild-scalable-mlm.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"500\" \/>\n\t<meta property=\"og:image:height\" content=\"250\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/build-scalable-machine-learning-model\/\",\"url\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/build-scalable-machine-learning-model\/\",\"name\":\"Build Scalable Machine Learning Models - DataHack Summit 2023\",\"isPartOf\":{\"@id\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/#website\"},\"datePublished\":\"2023-04-27T13:20:34+00:00\",\"dateModified\":\"2023-08-11T13:01:38+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/build-scalable-machine-learning-model\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/build-scalable-machine-learning-model\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/build-scalable-machine-learning-model\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Workshop\",\"item\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Build Scalable Machine Learning Models\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/#website\",\"url\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/\",\"name\":\"DataHack Summit 2023\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Build Scalable Machine Learning Models - DataHack Summit 2023","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/build-scalable-machine-learning-model\/","og_locale":"en_US","og_type":"article","og_title":"Build Scalable Machine Learning Models - DataHack Summit 2023","og_description":"In today&#8217;s data-driven world, the ability to build scalable machine learning models has become increasingly important. With the exponential growth of data, traditional machine learning approaches are often not sufficient to handle the large datasets that many organizations are dealing with. This is where Apache Spark comes in, providing a powerful distributed computing framework that [&hellip;]","og_url":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/build-scalable-machine-learning-model\/","og_site_name":"DataHack Summit 2023","article_modified_time":"2023-08-11T13:01:38+00:00","og_image":[{"width":500,"height":250,"url":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-content\/uploads\/2023\/06\/wbuild-scalable-mlm.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/build-scalable-machine-learning-model\/","url":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/build-scalable-machine-learning-model\/","name":"Build Scalable Machine Learning Models - DataHack Summit 2023","isPartOf":{"@id":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/#website"},"datePublished":"2023-04-27T13:20:34+00:00","dateModified":"2023-08-11T13:01:38+00:00","breadcrumb":{"@id":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/build-scalable-machine-learning-model\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/build-scalable-machine-learning-model\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/build-scalable-machine-learning-model\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/"},{"@type":"ListItem","position":2,"name":"Workshop","item":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/"},{"@type":"ListItem","position":3,"name":"Build Scalable Machine Learning Models"}]},{"@type":"WebSite","@id":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/#website","url":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/","name":"DataHack Summit 2023","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/pages\/1099"}],"collection":[{"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/comments?post=1099"}],"version-history":[{"count":21,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/pages\/1099\/revisions"}],"predecessor-version":[{"id":3329,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/pages\/1099\/revisions\/3329"}],"up":[{"embeddable":true,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/pages\/890"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/media\/1614"}],"wp:attachment":[{"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/media?parent=1099"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}