DataHour: Distinguishing Bot Text From Human Text Corpus
DataHour: Distinguishing Bot Text From Human Text Corpus
10 Dec 202207:12am - 10 Dec 202208:12am
DataHour: Distinguishing Bot Text From Human Text Corpus
About the Event
In this DataHour, Sumeet will give you a practical walkthrough on collection of Human Text Corpus for bilingual (English and Hindi) and applying preprocessing techniques to clean it.
He will be covering the following topics in detail:
- Generation of word vectors using TF-IDF and using language modeling techniques (n-gram) to get paragraph vectors of text.
- Generation of bot text for bilingual (English and Hindi) by using a subsample of human text corpus and passing it to the LSTM Auto Text Generation Neural Network (Encoder-Decoder Architecture).
- Generation of vectors for the bot text by using a similar preprocessing pipeline as for the human text.
- Clustering human and bot text using density based clustering.
- Computing heuristics for generated clusters and comparing them statistically using non parametric hypothesis tests to compare if the two clusters difference is statistically significant.
Prerequisites: Interest in learning the emerging and trending technologies and basic understanding of NLP, basics of Neural Network, Python, Statistical Hypothesis Testing and Clustering.
- Best articles get published on Analytics Vidhya’s Blog Space
- Best articles get published on Analytics Vidhya’s Blog Space
- Best articles get published on Analytics Vidhya’s Blog Space
- Best articles get published on Analytics Vidhya’s Blog Space
- Best articles get published on Analytics Vidhya’s Blog Space
Who is this DataHour for?
- Best articles get published on Analytics Vidhya’s Blog Space
- Best articles get published on Analytics Vidhya’s Blog Space
- Best articles get published on Analytics Vidhya’s Blog Space
About the Speaker
Sumeet Lalla has completed his Masters in Data Science from Higher School Of Economics Moscow and Bachelors of Engineering in Computer Engineering from Thapar University. With 5.5 years of experience in Data Science and Software Engineering he is currently working as a Data Scientist in Cognizant and have previously worked as Software Developer in Siemens Technology And Services and Technology Analyst in Deloitte Consulting and Pvt Ltd.
Participate in discussion
Registration Details
6706
Registered
Become a Speaker
Share your vision, inspire change, and leave a mark on the industry. We're calling for innovators and thought leaders to speak at our event
- Professional Exposure
- Networking Opportunities
- Thought Leadership
- Knowledge Exchange
- Leading-Edge Insights
- Community Contribution
