7 Amazing NLP Hack Sessions to Watch out for at DataHack Summit 2019

Sneha Jain 09 Jan, 2020 • 8 min read

Picture a world where:

  • Machines are able to have human-level conversations with us
  • Computers understand the context of the conversation without having to be told what the subject is
  • These machines can even write full-blown essays after being given the theme of the topic

This isn’t a movie script or a futuristic scenario – this is all happening right now thanks to the power of Natural Language Processing (NLP)! Here’s the incredible rise charted by Google Trends in the last decade:

Awesome, right? I honestly feel the number of breakthroughs happening in this field is unparalleled. The past two years have been a blur – the Transformer architecture, introduced in 2017, has truly transformed the NLP space.

From the super-efficient ULMFiT framework to Google’s BERT, NLP is truly in the midst of a golden era. Are you ready to be part of this revolution?

Then join us at DataHack Summit 2019, India’s largest applied Artificial Intelligence and Machine Learning conference between 13-16 November 2019 at the NIMHANS Convention Center in Bengaluru!

Reserve Your Seat TODAY!

I am sure you are already eager to learn more about these latest NLP frameworks. So why wait? Let me take you through the exciting hack sessions we have in store for you presented by top NLP experts.

Hack Sessions are one of the most in-demand and popular features of DataHack Summit. They are essentially hour-long live interactive coding sessions presented by the top data scientists from around the globe – a dream for all machine learning professionals!

 

Here’s the List of Power-Packed NLP Hack Sessions at DHS 2019

  1. Comparison of Transfer Learning Models in NLP
  2. Synthetic Text Data Generation using RNN based Deep Learning Models
  3. Identifying security vulnerabilities in software using Deep Transfer Learning for NLP
  4. Deep Learning for Search in E-Commerce
  5. Intent Identification for Indic Languages
  6. Interpreting State-of-the-Art NLP Models
  7. Automatic Subtitle Generation using NLP and Deep Learning

 

Comparison of Transfer Learning Models in NLP by Sudalai Rajkumar (SRK)

Have you come across the term Transfer Learning yet? If you haven’t, you need to get up to speed quickly! Almost every breakthrough happening in NLP and computer vision utilizes transfer learning to democratize it for the masses.

You need hundreds of GBs of RAM to run a super complex supervised machine learning problem. The state-of-the-art NLP frameworks like Google’s BERT, OpenAI’s GPT2, TransformerXL, XLNet, etc. are excellent in theory – but they require a ton of compute power. Not everyone has access to GPUs!

This is where transfer learning has been a game-changer. It has enabled us to use the latest NLP frameworks on our local machines, without having to shell out money on GPUs and computational resources.

In this hack session, our eminent speaker and a leading data scientist Sudalai Rajkumar will guide you to compare the performance of these different pre-trained models for NLP along with pre-trained word vector models on text classification tasks. It’s going to be one incredible hack session!

 

Key Takeaways from this Hack Session

  • Build pre-trained word embedding models for text classification
  • Use state-of-the-art NLP models like BERT, XLNet, and XLM
  • Fine-tune pre-trained language models for text classification tasks

Here are a few resources I recommend going through to brush up your transfer learning and NLP knowledge:

 

Intent Identification for Indic Languages by Krupal Modi

मुझे बुखार है, मेरा शरीर गर्म है – If you understand Hindi, you would have instantly understood what the meant. The two sentences convey the same meaning. But making this distinction is incredibly difficult for machines.

In fact, one of the biggest challenges in NLP right now is building models for non-English languages. We felt we really had to include this topic at DataHack Summit 2019.

In the booming age of smart devices, accurately detecting the intent of the user from natural language utterance is one of the fundamental problems to be solved in order to truly move from clicks to conversations.

This hack session by Haptik’s Director of Machine Learning Krupal Modi will focus on solving this problem for low resource languages.

 

Key Takeaways from this Hack Session

  • Understanding the granular problems and challenges of intent identification
  • Different approaches to solve the problem
  • Exposure to available public datasets for Indic languages and its utility

 

Identifying Security Vulnerabilities in Software using Deep Transfer Learning for NLP by Dipanjan Sarkar

I feel security is a topic not often spoken about in this space. When was the last time you heard deep learning being used for preventing adversarial attacks?

Vulnerabilities are quite common in software systems and can potentially cause a plethora of problems including deadlock, information loss, or system failure. The challenge lies in sufficiently capturing both semantic and syntactic representations of source code for building accurate prediction models.

Dipanjan Sarkar, one of the most popular speakers in the data science community, will help you leverage state-of-the-art learning models in NLP through GitHub events data to predict probable vulnerabilities with decent precision/recall rates based on data tested till 2018.

He will also share some unique use cases where deep transfer learning can be applied on text data and cover some of the interesting models including stacked bi-directional GRUs, pre-trained embeddings and leveraging transformer models like BERT.

 

Deep Learning for Search in E-Commerce by Sonu Sharma and Atul Agarwal

Machine learning is used in almost every part of the system at major search engines like Google, Bing, etc. However, most e-commerce websites are powered by search engines which provide excellent ROI and help in retaining and finally converting the user for a sale.

And the fact remains – improving search results offers a huge return on investment for retailers. It’s a topic I feel everyone should at least be aware of in their organization.

In this hack session at DataHack Summit 2019, Atul Agarwal and Sonu Sharma, software engineers at Walmart Labs, will come together and share insights on how to use NLP based Deep Learning models to effectively design search platforms with a focus on the e-commerce use case.

 

Key Takeaways from this Hack Session

  • Learn about Natural Language Processing techniques like word-embeddings – BERT / ELMo, Bi-LSTM Networks, etc.
  • Application of Named Entity Recognition (NER) and seq2seq modeling in the search domain
  • Get to know about various Multi-Class/Multi-Label Classification Problems related to Search Domains

BERT is the hottest trending NLP framework and you can get a complete understanding of it in this article:

 

Interpreting State-of-the-Art NLP Models

Building a complex and dense machine learning model has the potential of reaching our desired accuracy, but does it make sense? Can you open up the black-box and explain how the final results are derived?

This question is at the heart of machine learning – the ability to interpret and explain your model’s performance result is a critical requirement for stakeholders and clients.

Recent progress in NLP, with the advent of Attention-based models, has made it easier for us to interpret and understand the decisions of the model. Here is a fantastic hack session to learn to interpret sequence-to-sequence models with our amazing speaker Logesh Kumar Umapath.

His hack session would include techniques that can be used to interpret the decisions of Recurrent Neural Networks (RNNs), long short-term memory (LSTM) and Transformer models. It would also include model agnostic techniques for interpretation.

 

Key Takeaways from this Hack Session

  • Learn how to leverage attention models and layers for interpretation
  • Learn model-agnostic techniques for interpretation of NLP models

Here are a few awesome articles to get up to date with the topics being covered in this hack session:

 

Synthetic Text Data Generation using RNN based Deep Learning Models by Raghav Bali

Handwritten text recognition requires a large number of labeled samples, which are really costly to produce. Yes, there is that cost factor again. How cool would it be if we could build a handwritten text generation model without having to splurge out of our pockets?

Well, you don’t have to wait long to find the answer!

Raghav Bali, a Senior Data Scientist at UnitedHealth Group, will facilitate a hands-on code walkthrough which will enable you to prepare a simple deep learning model to generate text using Recurrent Neural Networks (RNNs).

You will also gain insights on use cases where deep learning techniques are being utilized to generate data using some interesting architectures. Here’s a quick summary of what Raghav will be covering:

  • A quick overview of RNNs and different DL architectures for such a use case
  • A brief introduction to some interesting research into this domain
  • Hands-on code walkthrough to prepare a simple DL model to generate text
  • Model fine-tuning and results

 

Key takeaways from this Hack Session

  • Understand the use of DL models to generate data (this talk is not about GANs!)
  • Build a DL model to generate synthetic handwritten text to solve real-world problems

You can take this quick tutorial on building a Recurrent Neural Network from Scratch in Python:

 

Automatic Subtitle Generation using NLP and Deep Learning by Prateek Joshi and Mohd Sanad Zaki Rizvi

Ever watched videos on YouTube or movies on Netflix and wondered how they generated such accurate subtitles? Manually doing that is a thankless and impossible job. Imagine the scale at which these platforms operate – they need a machine learning-powered solution.

That’s exactly what we will learn through this hack session by Analytics Vidhya’s two outstanding data scientists – Prateek Joshi and Mohd Sanad Zaki Rizvi.

In this exciting hack session, they will combine NLP and audio processing to automatically generate subtitles from a video. Here is the structure of the session they’re planning:

  • About Speech-to-Text Conversion
    • History
    • Use Cases
    • Challenges
  • Dataset and Approaches
  • Pretrained model vs. Model built from scratch
  • Python notebook walkthrough

 

Key Takeaways from this Hack Session

  • Working with Sequence Data
  • Processing Raw Audio
  • Converting Audio to Text

If you are interested in learning how to build your own speech to text model then here is a fantastic guide for you: 

 

End Notes

So are you ready to broaden your horizons and expand your skillset? This is the best time to get involved in the world of NLP – the hottest space in the data science space right now.

So why wait! DataHack Summit 2019 seats are filling fast and we have just a few tickets left so:

Reserve Your Seat TODAY!

It will be great to network with you at DataHack Summit 2019 – see you soon!

Sneha Jain 09 Jan 2020

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear