Getting Started With Natural Language Processing

Interacting with artificial intelligent systems seems a bit simulated at times. This is because the way we converse as humans to one another is completely different from that we do usually with AI systems. Thankfully, research has been rampant in the area to bridge the gap in conversational AI systems. In this 8-hour workshop, you will get to know about natural language processing, creating word embeddings and developing learners to perform NLP tasks like sentiment analysis, auto correction and much more.

Prerequisites for the workshop:

Python programming experience
Basic Knowledge of machine learning
Has participated in data science competitions or worked on real life data science projects

Structure Of The Workshop

Introduction to Natural Language Processing
Text pre-processing and Wrangling
- Removing HTML tags\noise
- Removing accented characters
- Removing special characters\symbols
- Handling contractions
- Stemming
- Lemmatization
- Stop word removal
Project: Build a duplicate character removal module
Project: Build a spell-check and correction module
Project: Build an end-to-end text pre-processor
Text Understanding
- POS (Parts of Speech) Tagging
- Text Parsing
  - Shallow Parsing
  - Dependency Parsing
  - Constituency Parsing
- NER (Named Entity Recognition) Tagging
Project: Build your own POS Tagger
Project: Build your own NER Tagger
Text Representation – Feature Engineering

Traditional Statistical Models – BOW, TF-IDF
Newer Deep Learning Models for word embeddings – Word2Vec, GloVe, FastText

Project: Similarity and Movie Recommendations
Project: Interactive exploration of Word Embeddings
Case Studies for other common NLP Tasks
- Project: Sentiment Analysis using unsupervised learning and supervised learning (machine and deep learning)
- Project: Text Clustering (grouping similar movies)
- Project: Text Summarization and Topic Models
Promise of Deep Learning for NLP, Transfer and Generative Learning
Final words and where to go from here?

Key Takeaways:

Learn and understand popular NLP workflows with interactive examples
Covers concepts and interactive projects on cleaning and handling noisy unstructured text data including duplicate checks, spelling corrections and text wrangling
Build your own POS and NER taggers and parse text data to understand it better
Understand, build and explore text semantics and representations with traditional statistical models and newer word embedding models
Projects on popular NLP tasks including text classification, sentiment analysis, text clustering, summarization, topic models and recommendations
Brief coverage of the promise of deep learning for NLP

System requirements: Standard system with 4-8GB RAM, 2-4 core processor(i5\i7\AMD), GPU preferred for some deep learning tasks but not essential, Windows\Linux\Mac OS. Cloud based services like AWS EC2 also work fine. Notes: Participants needs to carry their laptop for the workshop. Anaconda distribution (Python 3.6) preferred with the following libraries pre-installed: nltk, spacy, TextBlob scikit-learn, numpy, pandas, keras, tensorflow If you install Python 3.7 do remember that keras+tensorflow may not be available with a stable build

INSTRUCTORS

Dipanjan Sarkar

Dipanjan (DJ) Sarkar is a Data Scientist, a published author and a consultant and trainer. He has consulted and worked with several startups as well as Fortune 500 companies like Intel. He primarily works on leveraging data science, advanced analytics, machine learning and deep learning to build large- scale intelligent systems. He holds a master of technology degree with specializations in Data Science and Software Engineering. He is also an avid supporter of self-learning and massive open online courses. He plans to venture soon into the world of open-source products to improve the productivity of developers across the world. Dipanjan has been an analytics practitioner for several years now, specializing in machine learning, natural language processing, statistical methods and deep learning. Having a passion for data science and education, he also acts as an AI Consultant and Mentor at various organizations like Springboard, where he helps people build their skills on areas like Data Science and Machine Learning. He also acts as a key contributor and editor for Towards Data Science, a leading online journal focusing on Artificial Intelligence and Data Science. Dipanjan has also authored several books on R, Python, Machine Learning, Social Media Analytics, Natural Language Processing Deep Learning.

Raghav Bali

Raghav Bali is a Data Scientist at one the world’s largest healthcare organizations. His work involves research development of enterprise level solutions based on Machine Learning, Deep Learning and Natural Language Processing for Healthcare Insurance related use cases. In his previous role at Intel, he was involved in enabling proactive data driven initiatives using Natural Language Processing, Deep Learning and traditional statistical methods. He has also worked in the financial domain with American Express, solving digital engagement and customer retention use cases. Raghav has also authored multiple books with leading publishers, the recent one on latest in advancements in Transfer Learning research. Raghav has a master’s degree (gold medalist) in Information Technology from International Institute of Information Technology, Bangalore. Raghav loves reading and is a shutterbug capturing moments when he isn’t busy solving problems.

Workshop Date:

24th and 25th November, 2018

Workshop Venue:

Hotel Royal Orchid Bangalore

Buy Ticket