Analytics Vidhya — June 29, 2015
Beginner Business Analytics Infographic Infographics NLP Python Technique Text Unstructured Data

Introduction

Twitter has become an inevitable channel for brand management. It has compelled brands to become more responsive to their customers. On the other hand, the damage it would cause can’t be undone. The 140 character tweets has now become a powerful tool for customers / users to directly convey messages to brands.

For companies, these tweets carry a lot of information like sentiment, engagement, reviews and features of its products and what not. However, mining these tweets isn’t easy. Why? Because, before you mine this data, you need to perform a lot of cleaning. These tweets, once extracted can come with unwanted html characters, bad grammar and poor spellings – making the mining very difficult.

Below is the infographic, which displays the steps of cleaning this data related to tweets before mining them. While the example in use is of Twitter, you can of course apply these methods to any text mining problem. We’ve used Python to execute these cleaning steps.

text mining using python, data science infographics

Download the PDF Version of this infographic and refer the python codes to perform Text Mining and follow your ‘Next Steps…’ -> Download Here

To view the complete article on effective steps to perform data cleaning using python -> visit here

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

About the Author

Analytics Vidhya

This is the official account of the Analytics Vidhya team.

Our Top Authors

  • Analytics Vidhya
  • Guest Blog
  • Tavish Srivastava
  • Aishwarya Singh
  • Ram Dewani
  • Faizan Shaikh
  • Aniruddha Bhandari

Download Analytics Vidhya App for the Latest blog/Article

6 thoughts on "Quick Guide: Steps To Perform Text Data Cleaning in Python"

Indraneel Pise
Indraneel Pise says: June 30, 2015 at 4:37 am
Stemming is also an important step in text mining. You could include that too. Reply
Rohit
Rohit says: June 30, 2015 at 12:22 pm
remember to deal with character encodings. Reply
Rohit Shetty
Rohit Shetty says: June 30, 2015 at 12:25 pm
Thou r awwssomee Reply
Teresa Rothaar
Teresa Rothaar says: June 30, 2015 at 2:49 pm
This is great. I'm doing a project on social media data mining for my capstone class for my MS in MIS, and this is very helpful. Thanks so much! Reply
Whaly
Whaly says: July 20, 2015 at 1:35 am
This is awesome, can you check the urls in the last step? They seem not working :) Reply
mzkarim
mzkarim says: October 30, 2015 at 6:09 am
Great resource. Thanks. Reply

Leave a Reply Your email address will not be published. Required fields are marked *