{"id":1190,"date":"2023-05-03T13:40:47","date_gmt":"2023-05-03T13:40:47","guid":{"rendered":"https:\/\/www.analyticsvidhya.com\/datahack-summit-2023\/?page_id=1190"},"modified":"2023-08-04T10:42:46","modified_gmt":"2023-08-04T05:12:46","slug":"solving-real-world-problems-using-reinforcement-learning","status":"publish","type":"page","link":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/solving-real-world-problems-using-reinforcement-learning\/","title":{"rendered":"Solving real world problems using Reinforcement Learning"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">The revolutionary Generative model, ChatGPT uses Reinforcement Learning under the hood. Reinforcement Learning from Human Feedback (RLHF) is the core working principle behind these technologies. RLHF is used to align the Large Language Models to the human preferences. It\u2019s evident that Reinforcement Learning has a lot of potential to solve real world problems.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In this workshop, you will learn Reinforcement Learning starting from basics to advanced and understand how to apply Reinforcement Learning to real world problems. You will also learn about RLHF and its significance in Large Language Models. Whether you\u2019re a seasoned AI practitioner or just starting out, this workshop will equip you with the tools and knowledge to tackle real world challenges using Reinforcement Learning. Join us and discover how Reinforcement Learning can transform the way you approach problem solving!\u00a0<\/span><\/p>\n<h4><span style=\"font-weight: 400;\"><br \/>\nModule 1:\u00a0 Mathematical Prerequisites for\u00a0 Reinforcement Learning\u00a0<\/span><\/h4>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Markov Decision Processes<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Bellman equation and Dynamic Programming<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Value Iteration<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Policy Iteration<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Hands on experience &#8211; Jupyter notebook with simple numpy based tutorial with solution for Value Iteration and Policy Iteration<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Introduction to Partially Observable Markov Decision Processed and Games<\/span><\/li>\n<\/ul>\n<h4><span style=\"font-weight: 400;\"><br \/>\nModule 2: Simple Reinforcement Learning\u00a0<\/span><\/h4>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Temporal difference (TD) learning and Monte Carlo (MC) methods<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">RL &#8211; framework: OpenAI Gym Environment<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Exploration vs Exploitation in RL<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Actor Only, Critic Only and Actor Critic Algorithms<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Q-learning<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">SARSA<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">REINFORCE<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Jupyter notebook tutorial with solution for TD, MC, Q-learning, SARSA, REINFORCE<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Discussion on online vs offline RL<\/span><\/li>\n<\/ul>\n<h4><span style=\"font-weight: 400;\"><br \/>\nModule 3: Reinforcement Learning with Function Approximation\u00a0<\/span><\/h4>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Basic Introduction to Linear Function Approximation<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Deadly triad of Deep RL &#8211; function approximation, bootstrapping and offline learning<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">DQN and variants<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">OpenAI Spinning up based tutorial on DQN with solution<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Stochastic Policy Gradient Theorem<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">PPO and variants<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">OpenAI Spinning up based tutorial on PPO with solution<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Deterministic Policy Gradient Theorem<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">DDPG, TD3, SAC<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">OpenAI Spinning up based tutorial on TD3 with solution<\/span><\/li>\n<\/ul>\n<h4><span style=\"font-weight: 400;\"><br \/>\nModule 4: RLHF for LLMs\u00a0<\/span><\/h4>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">LLM Basics<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Types of human feedback<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Supervised Fine Tuning &#8211; Basics<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Reward Model from Human Feedback<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">RL based LLM finetuning with PPO<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">RL based LLM finetuning with ILQL<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">TRLX based tutorial on finetuning GPT2 with PPO and ILQL*<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Discussion on other RLHF open source libraries.*<\/span><\/li>\n<\/ul>\n<p><strong>Prerequisites:<\/strong><\/p>\n<ul>\n<li>System Requirement and Setup\n<ul>\n<li>Laptop with at least 4-8 GB of RAM<\/li>\n<li>We will be using a cloud jupyter notebook powered by GPU for the workshop<\/li>\n<\/ul>\n<\/li>\n<li>Offline Setup [Optional]\n<ul>\n<li>GPU good to have!<\/li>\n<li>Install Python3.9 or higher version(<a href=\"https:\/\/www.python.org\/downloads\/\" target=\"_blank\" rel=\"noopener\">Resource<\/a>)<\/li>\n<li>Install jupyter notebook (<a href=\"https:\/\/jupyter.org\/install\" target=\"_blank\" rel=\"noopener\">Resource<\/a>)<\/li>\n<\/ul>\n<\/li>\n<li>Pre-reads\n<ul>\n<li>Programming knowledge in Python (<a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2016\/01\/complete-tutorial-learn-data-science-python-scratch-2\/\" target=\"_blank\" rel=\"noopener\">Resource<\/a>)<\/li>\n<li>Jupyter Notebook Environment familiarity (<a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2018\/05\/starters-guide-jupyter-notebook\/\" target=\"_blank\" rel=\"noopener\">Resource<\/a>)<\/li>\n<li>Basics of Machine Learning and Deep Learning (<a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2017\/09\/common-machine-learning-algorithms\/\" target=\"_blank\" rel=\"noopener\">Resource<\/a>,<a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2020\/07\/neural-networks-from-scratch-in-python-and-r\/\" target=\"_blank\" rel=\"noopener\">Resource<\/a>)<\/li>\n<li>Familiarity with Pytorch\/Tensorflow(<a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2020\/07\/how-to-train-an-image-classification-model-in-pytorch-and-tensorflow\/\" target=\"_blank\" rel=\"noopener\">Resource1<\/a>,<a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2020\/03\/tensorflow-2-tutorial-deep-learning\/\" target=\"_blank\" rel=\"noopener\">Resource2<\/a>)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Note: These are tentative details and are subject to change. * &#8211; If time permits<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The revolutionary Generative model, ChatGPT uses Reinforcement Learning under the hood. Reinforcement Learning from Human Feedback (RLHF) is the core working principle behind these technologies. RLHF is used to align the Large Language Models to the human preferences. It\u2019s evident that Reinforcement Learning has a lot of potential to solve real world problems.\u00a0 In this [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1613,"parent":890,"menu_order":2,"comment_status":"closed","ping_status":"closed","template":"workshop-detail.php","meta":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Solving real world problems using Reinforcement Learning - DataHack Summit 2023<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/solving-real-world-problems-using-reinforcement-learning\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Solving real world problems using Reinforcement Learning - DataHack Summit 2023\" \/>\n<meta property=\"og:description\" content=\"The revolutionary Generative model, ChatGPT uses Reinforcement Learning under the hood. Reinforcement Learning from Human Feedback (RLHF) is the core working principle behind these technologies. RLHF is used to align the Large Language Models to the human preferences. It\u2019s evident that Reinforcement Learning has a lot of potential to solve real world problems.\u00a0 In this [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/solving-real-world-problems-using-reinforcement-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"DataHack Summit 2023\" \/>\n<meta property=\"article:modified_time\" content=\"2023-08-04T05:12:46+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-content\/uploads\/2023\/06\/w_reforchment-learning.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"500\" \/>\n\t<meta property=\"og:image:height\" content=\"250\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/solving-real-world-problems-using-reinforcement-learning\/\",\"url\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/solving-real-world-problems-using-reinforcement-learning\/\",\"name\":\"Solving real world problems using Reinforcement Learning - DataHack Summit 2023\",\"isPartOf\":{\"@id\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/#website\"},\"datePublished\":\"2023-05-03T13:40:47+00:00\",\"dateModified\":\"2023-08-04T05:12:46+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/solving-real-world-problems-using-reinforcement-learning\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/solving-real-world-problems-using-reinforcement-learning\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/solving-real-world-problems-using-reinforcement-learning\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Workshop\",\"item\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Solving real world problems using Reinforcement Learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/#website\",\"url\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/\",\"name\":\"DataHack Summit 2023\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.analyticsvidhya.com\/dhs-2023\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Solving real world problems using Reinforcement Learning - DataHack Summit 2023","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/solving-real-world-problems-using-reinforcement-learning\/","og_locale":"en_US","og_type":"article","og_title":"Solving real world problems using Reinforcement Learning - DataHack Summit 2023","og_description":"The revolutionary Generative model, ChatGPT uses Reinforcement Learning under the hood. Reinforcement Learning from Human Feedback (RLHF) is the core working principle behind these technologies. RLHF is used to align the Large Language Models to the human preferences. It\u2019s evident that Reinforcement Learning has a lot of potential to solve real world problems.\u00a0 In this [&hellip;]","og_url":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/solving-real-world-problems-using-reinforcement-learning\/","og_site_name":"DataHack Summit 2023","article_modified_time":"2023-08-04T05:12:46+00:00","og_image":[{"width":500,"height":250,"url":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-content\/uploads\/2023\/06\/w_reforchment-learning.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/solving-real-world-problems-using-reinforcement-learning\/","url":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/solving-real-world-problems-using-reinforcement-learning\/","name":"Solving real world problems using Reinforcement Learning - DataHack Summit 2023","isPartOf":{"@id":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/#website"},"datePublished":"2023-05-03T13:40:47+00:00","dateModified":"2023-08-04T05:12:46+00:00","breadcrumb":{"@id":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/solving-real-world-problems-using-reinforcement-learning\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/solving-real-world-problems-using-reinforcement-learning\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/solving-real-world-problems-using-reinforcement-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/"},{"@type":"ListItem","position":2,"name":"Workshop","item":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/workshop\/"},{"@type":"ListItem","position":3,"name":"Solving real world problems using Reinforcement Learning"}]},{"@type":"WebSite","@id":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/#website","url":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/","name":"DataHack Summit 2023","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/pages\/1190"}],"collection":[{"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/comments?post=1190"}],"version-history":[{"count":24,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/pages\/1190\/revisions"}],"predecessor-version":[{"id":3321,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/pages\/1190\/revisions\/3321"}],"up":[{"embeddable":true,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/pages\/890"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/media\/1613"}],"wp:attachment":[{"href":"https:\/\/www.analyticsvidhya.com\/dhs-2023\/wp-json\/wp\/v2\/media?parent=1190"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}