{"id":1656,"date":"2023-06-07T11:46:28","date_gmt":"2023-06-07T06:16:28","guid":{"rendered":"https:\/\/www.analyticsvidhya.com\/datahack-summit-2023\/?page_id=1656"},"modified":"2023-07-19T19:05:46","modified_gmt":"2023-07-19T13:35:46","slug":"distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models","status":"publish","type":"page","link":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models\/","title":{"rendered":"Distributed Deep Learning Acceleration Framework optimization for Large Generative Models"},"content":{"rendered":"<p>&#8220;Leveraging compute over 175 billion or trillion parameter generative LLMs like GPT4 or ChatGPT require extensive distributed partitioning of tensors, models and pipelines for different acceleration frameworks. Frameworks such as Pytorch or Tensorflow leverage some of the basic features , which are insufficient to train larger models efficiently . In this session, we will see how acceleration frameworks such as IPEX or Deepspeed can modulate tensor slicing, transformer architecture partitioning across multiple GPU cards through different mechanisms along with distributed data parallelism to efficiently train models like ChatGPT. Also we will see the variation in distributed parallelism for GPU clusters in case of RLHF induced Generative LLMs.<\/p>\n<p><strong>Key Takeaways:<\/strong><\/p>\n<ol>\n<li>Importance of distributed partitioning in handling high-parameter generative LLMs like GPT4 or ChatGPT.<\/li>\n<li>Limitations of PyTorch and TensorFlow in efficiently training large models.<\/li>\n<li>Benefits of using advanced acceleration frameworks such as IPEX or DeepSpeed for efficient tensor slicing and transformer architecture partitioning.<\/li>\n<li>Understanding the role of distributed data parallelism in enhancing model training.<\/li>\n<li>Insights into variance in distributed parallelism for RLHF-induced Generative LLMs.&#8221;<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;Leveraging compute over 175 billion or trillion parameter generative LLMs like GPT4 or ChatGPT require extensive distributed partitioning of tensors, models and pipelines for different acceleration frameworks. Frameworks such as Pytorch or Tensorflow leverage some of the basic features , which are insufficient to train larger models efficiently . In this session, we will see [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1657,"parent":1126,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"session-details.php","meta":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Distributed Deep Learning Acceleration Framework optimization for Large Generative Models - DataHack Summit 2023<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Distributed Deep Learning Acceleration Framework optimization for Large Generative Models - DataHack Summit 2023\" \/>\n<meta property=\"og:description\" content=\"&#8220;Leveraging compute over 175 billion or trillion parameter generative LLMs like GPT4 or ChatGPT require extensive distributed partitioning of tensors, models and pipelines for different acceleration frameworks. Frameworks such as Pytorch or Tensorflow leverage some of the basic features , which are insufficient to train larger models efficiently . In this session, we will see [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models\/\" \/>\n<meta property=\"og:site_name\" content=\"DataHack Summit 2023\" \/>\n<meta property=\"article:modified_time\" content=\"2023-07-19T13:35:46+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-content\/uploads\/2023\/06\/Deep-Learning-Acceleration.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"500\" \/>\n\t<meta property=\"og:image:height\" content=\"250\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models\/\",\"url\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models\/\",\"name\":\"Distributed Deep Learning Acceleration Framework optimization for Large Generative Models - DataHack Summit 2023\",\"isPartOf\":{\"@id\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/#website\"},\"datePublished\":\"2023-06-07T06:16:28+00:00\",\"dateModified\":\"2023-07-19T13:35:46+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Session\",\"item\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Distributed Deep Learning Acceleration Framework optimization for Large Generative Models\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/#website\",\"url\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/\",\"name\":\"DataHack Summit 2023\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Distributed Deep Learning Acceleration Framework optimization for Large Generative Models - DataHack Summit 2023","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models\/","og_locale":"en_US","og_type":"article","og_title":"Distributed Deep Learning Acceleration Framework optimization for Large Generative Models - DataHack Summit 2023","og_description":"&#8220;Leveraging compute over 175 billion or trillion parameter generative LLMs like GPT4 or ChatGPT require extensive distributed partitioning of tensors, models and pipelines for different acceleration frameworks. Frameworks such as Pytorch or Tensorflow leverage some of the basic features , which are insufficient to train larger models efficiently . In this session, we will see [&hellip;]","og_url":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models\/","og_site_name":"DataHack Summit 2023","article_modified_time":"2023-07-19T13:35:46+00:00","og_image":[{"width":500,"height":250,"url":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-content\/uploads\/2023\/06\/Deep-Learning-Acceleration.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models\/","url":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models\/","name":"Distributed Deep Learning Acceleration Framework optimization for Large Generative Models - DataHack Summit 2023","isPartOf":{"@id":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/#website"},"datePublished":"2023-06-07T06:16:28+00:00","dateModified":"2023-07-19T13:35:46+00:00","breadcrumb":{"@id":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/distributed-deep-learning-acceleration-framework-optimization-for-large-generative-models\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/"},{"@type":"ListItem","position":2,"name":"Session","item":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/session\/"},{"@type":"ListItem","position":3,"name":"Distributed Deep Learning Acceleration Framework optimization for Large Generative Models"}]},{"@type":"WebSite","@id":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/#website","url":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/","name":"DataHack Summit 2023","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/pages\/1656"}],"collection":[{"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/comments?post=1656"}],"version-history":[{"count":3,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/pages\/1656\/revisions"}],"predecessor-version":[{"id":2107,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/pages\/1656\/revisions\/2107"}],"up":[{"embeddable":true,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/pages\/1126"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/media\/1657"}],"wp:attachment":[{"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/media?parent=1656"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}