7 Amazing NLP Hack Sessions to Watch out for at DataHack Summit 2019

Last Updated : 09 Jan, 2020

8 min read

Picture a world where:

Machines are able to have human-level conversations with us
Computers understand the context of the conversation without having to be told what the subject is
These machines can even write full-blown essays after being given the theme of the topic

This isn’t a movie script or a futuristic scenario – this is all happening right now thanks to the power of Natural Language Processing (NLP)! Here’s the incredible rise charted by Google Trends in the last decade:

Awesome, right? I honestly feel the number of breakthroughs happening in this field is unparalleled. The past two years have been a blur – the Transformer architecture, introduced in 2017, has truly transformed the NLP space.

From the super-efficient ULMFiT framework to Google’s BERT, NLP is truly in the midst of a golden era. Are you ready to be part of this revolution?

Then join us at DataHack Summit 2019, India’s largest applied Artificial Intelligence and Machine Learning conference between 13-16 November 2019 at the NIMHANS Convention Center in Bengaluru!

Reserve Your Seat TODAY!

I am sure you are already eager to learn more about these latest NLP frameworks. So why wait? Let me take you through the exciting hack sessions we have in store for you presented by top NLP experts.

Hack Sessions are one of the most in-demand and popular features of DataHack Summit. They are essentially hour-long live interactive coding sessions presented by the top data scientists from around the globe – a dream for all machine learning professionals!

Here’s the List of Power-Packed NLP Hack Sessions at DHS 2019

Comparison of Transfer Learning Models in NLP
Synthetic Text Data Generation using RNN based Deep Learning Models
Identifying security vulnerabilities in software using Deep Transfer Learning for NLP
Deep Learning for Search in E-Commerce
Intent Identification for Indic Languages
Interpreting State-of-the-Art NLP Models
Automatic Subtitle Generation using NLP and Deep Learning

Comparison of Transfer Learning Models in NLP by Sudalai Rajkumar (SRK)

Have you come across the term Transfer Learning yet? If you haven’t, you need to get up to speed quickly! Almost every breakthrough happening in NLP and computer vision utilizes transfer learning to democratize it for the masses.

You need hundreds of GBs of RAM to run a super complex supervised machine learning problem. The state-of-the-art NLP frameworks like Google’s BERT, OpenAI’s GPT2, TransformerXL, XLNet, etc. are excellent in theory – but they require a ton of compute power. Not everyone has access to GPUs!

This is where transfer learning has been a game-changer. It has enabled us to use the latest NLP frameworks on our local machines, without having to shell out money on GPUs and computational resources.

In this hack session, our eminent speaker and a leading data scientist Sudalai Rajkumar will guide you to compare the performance of these different pre-trained models for NLP along with pre-trained word vector models on text classification tasks. It’s going to be one incredible hack session!

Key Takeaways from this Hack Session

Build pre-trained word embedding models for text classification
Use state-of-the-art NLP models like BERT, XLNet, and XLM
Fine-tune pre-trained language models for text classification tasks

Here are a few resources I recommend going through to brush up your transfer learning and NLP knowledge:

Intent Identification for Indic Languages by Krupal Modi

मुझे बुखार है, मेरा शरीर गर्म है – If you understand Hindi, you would have instantly understood what the meant. The two sentences convey the same meaning. But making this distinction is incredibly difficult for machines.

In fact, one of the biggest challenges in NLP right now is building models for non-English languages. We felt we really had to include this topic at DataHack Summit 2019.

In the booming age of smart devices, accurately detecting the intent of the user from natural language utterance is one of the fundamental problems to be solved in order to truly move from clicks to conversations.

This hack session by Haptik’s Director of Machine Learning Krupal Modi will focus on solving this problem for low resource languages.

Key Takeaways from this Hack Session

Understanding the granular problems and challenges of intent identification
Different approaches to solve the problem
Exposure to available public datasets for Indic languages and its utility

Identifying Security Vulnerabilities in Software using Deep Transfer Learning for NLP by Dipanjan Sarkar

I feel security is a topic not often spoken about in this space. When was the last time you heard deep learning being used for preventing adversarial attacks?

Vulnerabilities are quite common in software systems and can potentially cause a plethora of problems including deadlock, information loss, or system failure. The challenge lies in sufficiently capturing both semantic and syntactic representations of source code for building accurate prediction models.

Dipanjan Sarkar, one of the most popular speakers in the data science community, will help you leverage state-of-the-art learning models in NLP through GitHub events data to predict probable vulnerabilities with decent precision/recall rates based on data tested till 2018.

He will also share some unique use cases where deep transfer learning can be applied on text data and cover some of the interesting models including stacked bi-directional GRUs, pre-trained embeddings and leveraging transformer models like BERT.

Deep Learning for Search in E-Commerce by Sonu Sharma and Atul Agarwal

Machine learning is used in almost every part of the system at major search engines like Google, Bing, etc. However, most e-commerce websites are powered by search engines which provide excellent ROI and help in retaining and finally converting the user for a sale.

And the fact remains – improving search results offers a huge return on investment for retailers. It’s a topic I feel everyone should at least be aware of in their organization.

In this hack session at DataHack Summit 2019, Atul Agarwal and Sonu Sharma, software engineers at Walmart Labs, will come together and share insights on how to use NLP based Deep Learning models to effectively design search platforms with a focus on the e-commerce use case.

Key Takeaways from this Hack Session

Learn about Natural Language Processing techniques like word-embeddings – BERT / ELMo, Bi-LSTM Networks, etc.
Application of Named Entity Recognition (NER) and seq2seq modeling in the search domain
Get to know about various Multi-Class/Multi-Label Classification Problems related to Search Domains

BERT is the hottest trending NLP framework and you can get a complete understanding of it in this article:

Demystifying BERT: A Comprehensive Guide to the Groundbreaking NLP Framework

Interpreting State-of-the-Art NLP Models

Building a complex and dense machine learning model has the potential of reaching our desired accuracy, but does it make sense? Can you open up the black-box and explain how the final results are derived?

This question is at the heart of machine learning – the ability to interpret and explain your model’s performance result is a critical requirement for stakeholders and clients.

Recent progress in NLP, with the advent of Attention-based models, has made it easier for us to interpret and understand the decisions of the model. Here is a fantastic hack session to learn to interpret sequence-to-sequence models with our amazing speaker Logesh Kumar Umapath.

His hack session would include techniques that can be used to interpret the decisions of Recurrent Neural Networks (RNNs), long short-term memory (LSTM) and Transformer models. It would also include model agnostic techniques for interpretation.

Key Takeaways from this Hack Session

Learn how to leverage attention models and layers for interpretation
Learn model-agnostic techniques for interpretation of NLP models

Here are a few awesome articles to get up to date with the topics being covered in this hack session:

Synthetic Text Data Generation using RNN based Deep Learning Models by Raghav Bali

Handwritten text recognition requires a large number of labeled samples, which are really costly to produce. Yes, there is that cost factor again. How cool would it be if we could build a handwritten text generation model without having to splurge out of our pockets?

Well, you don’t have to wait long to find the answer!

Raghav Bali, a Senior Data Scientist at UnitedHealth Group, will facilitate a hands-on code walkthrough which will enable you to prepare a simple deep learning model to generate text using Recurrent Neural Networks (RNNs).

You will also gain insights on use cases where deep learning techniques are being utilized to generate data using some interesting architectures. Here’s a quick summary of what Raghav will be covering:

A quick overview of RNNs and different DL architectures for such a use case
A brief introduction to some interesting research into this domain
Hands-on code walkthrough to prepare a simple DL model to generate text
Model fine-tuning and results

Key takeaways from this Hack Session

Understand the use of DL models to generate data (this talk is not about GANs!)
Build a DL model to generate synthetic handwritten text to solve real-world problems

You can take this quick tutorial on building a Recurrent Neural Network from Scratch in Python:

Build a Recurrent Neural Network from Scratch in Python – An Essential Read for Data Scientists

Automatic Subtitle Generation using NLP and Deep Learning by Prateek Joshi and Mohd Sanad Zaki Rizvi

Ever watched videos on YouTube or movies on Netflix and wondered how they generated such accurate subtitles? Manually doing that is a thankless and impossible job. Imagine the scale at which these platforms operate – they need a machine learning-powered solution.

That’s exactly what we will learn through this hack session by Analytics Vidhya’s two outstanding data scientists – Prateek Joshi and Mohd Sanad Zaki Rizvi.

In this exciting hack session, they will combine NLP and audio processing to automatically generate subtitles from a video. Here is the structure of the session they’re planning:

About Speech-to-Text Conversion
- History
- Use Cases
- Challenges
Dataset and Approaches
Pretrained model vs. Model built from scratch
Python notebook walkthrough

Key Takeaways from this Hack Session

Working with Sequence Data
Processing Raw Audio
Converting Audio to Text

If you are interested in learning how to build your own speech to text model then here is a fantastic guide for you:

Learn how to Build your own Speech-to-Text Model (using Python)

End Notes

So are you ready to broaden your horizons and expand your skillset? This is the best time to get involved in the world of NLP – the hottest space in the data science space right now.

So why wait! DataHack Summit 2019 seats are filling fast and we have just a few tickets left so:

Reserve Your Seat TODAY!

It will be great to network with you at DataHack Summit 2019 – see you soon!

Analytics Vidhya Intermediate Listicle NLP

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

7 Amazing NLP Hack Sessions to Watch out for at DataHack Summit 2019

Reserve Your Seat TODAY!

Here’s the List of Power-Packed NLP Hack Sessions at DHS 2019

Comparison of Transfer Learning Models in NLP by Sudalai Rajkumar (SRK)

Key Takeaways from this Hack Session

Intent Identification for Indic Languages by Krupal Modi

Key Takeaways from this Hack Session

Identifying Security Vulnerabilities in Software using Deep Transfer Learning for NLP by Dipanjan Sarkar

Deep Learning for Search in E-Commerce by Sonu Sharma and Atul Agarwal

Key Takeaways from this Hack Session

Interpreting State-of-the-Art NLP Models

Key Takeaways from this Hack Session

Synthetic Text Data Generation using RNN based Deep Learning Models by Raghav Bali

Key takeaways from this Hack Session

Automatic Subtitle Generation using NLP and Deep Learning by Prateek Joshi and Mohd Sanad Zaki Rizvi

Key Takeaways from this Hack Session

End Notes

Reserve Your Seat TODAY!

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us