DataHour: Distinguishing Bot Text From Human Text Corpus

DataHour: Distinguishing Bot Text From Human Text Corpus

10 Dec 202207:12am - 10 Dec 202208:12am

DataHour: Distinguishing Bot Text From Human Text Corpus

About the Event

In this DataHour, Sumeet will give you a practical walkthrough on collection of Human Text Corpus for bilingual (English and Hindi) and applying preprocessing techniques to clean it. 

He will be covering the following topics in detail:

  • Generation of word vectors using TF-IDF and using language modeling techniques (n-gram) to get paragraph vectors of text. 
  • Generation of bot text for bilingual (English and Hindi) by using a subsample of human text corpus and passing it to the LSTM Auto Text Generation Neural Network (Encoder-Decoder Architecture). 
  • Generation of vectors for the bot text by using a similar preprocessing pipeline as for the human text. 
  • Clustering human and bot text using density based clustering. 
  • Computing heuristics for generated clusters and comparing them statistically using non parametric hypothesis tests to compare if the two clusters difference is statistically significant.


Prerequisites:
 Interest in learning the emerging and trending technologies and basic understanding of NLP, basics of Neural Network, Python, Statistical Hypothesis Testing and Clustering.

  1. Best articles get published on Analytics Vidhya’s Blog Space
  2. Best articles get published on Analytics Vidhya’s Blog Space
  3. Best articles get published on Analytics Vidhya’s Blog Space
  4. Best articles get published on Analytics Vidhya’s Blog Space
  5. Best articles get published on Analytics Vidhya’s Blog Space

Who is this DataHour for?

  1. Best articles get published on Analytics Vidhya’s Blog Space
  2. Best articles get published on Analytics Vidhya’s Blog Space
  3. Best articles get published on Analytics Vidhya’s Blog Space

About the Speaker

Sumeet Lalla

Sumeet Lalla

Data Scientist at Cognizant

Sumeet Lalla has completed his Masters in Data Science from Higher School Of Economics Moscow and Bachelors of Engineering in Computer Engineering from Thapar University. With 5.5 years of experience in Data Science and Software Engineering he is currently working as a Data Scientist in Cognizant and have previously worked as Software Developer in Siemens Technology And Services and Technology Analyst in Deloitte Consulting and Pvt Ltd.

Participate in discussion

Registration Details

6706

Registered

Become a Speaker

Share your vision, inspire change, and leave a mark on the industry. We're calling for innovators and thought leaders to speak at our event

  • Professional Exposure
  • Networking Opportunities
  • Thought Leadership
  • Knowledge Exchange
  • Leading-Edge Insights
  • Community Contribution