Quick Guide: Steps To Perform Text Data Cleaning in Python

Analytics Vidhya Last Updated : 12 Jul, 2020

< 1 min read

Introduction

Twitter has become an inevitable channel for brand management. It has compelled brands to become more responsive to their customers. On the other hand, the damage it would cause can’t be undone. The 140 character tweets has now become a powerful tool for customers / users to directly convey messages to brands.

For companies, these tweets carry a lot of information like sentiment, engagement, reviews and features of its products and what not. However, mining these tweets isn’t easy. Why? Because, before you mine this data, you need to perform a lot of cleaning. These tweets, once extracted can come with unwanted html characters, bad grammar and poor spellings – making the mining very difficult.

Below is the infographic, which displays the steps of cleaning this data related to tweets before mining them. While the example in use is of Twitter, you can of course apply these methods to any text mining problem. We’ve used Python to execute these cleaning steps.

Download the PDF Version of this infographic and refer the python codes to perform Text Mining and follow your ‘Next Steps…’ -> Download Here

To view the complete article on effective steps to perform data cleaning using python -> visit here

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.

Analytics Vidhya

Analytics Vidhya Content team

Free Courses

4.7

Introduction to CrewAI: Building a Researcher Assistant Agent

Build smart AI agents with CrewAI to automate tasks and solve problems.

4.7

Understanding the working of Neural Networks

Learn the neural network basics, concepts, layers, and activation functions.

4.5

No Code Predictive Analytics with Orange

No-code AI course for business pros with real-world ML use cases.

4.5

Learn to Build Intelligent Chatbots using AI

Build ethical chatbots via OpenAI & LangChain using PDF data.

4.6

GenAI Landscape: Foundations & Hands On

Learn Generative AI basics: prompting, RAG, fine-tuning & agents.

Responses From Readers

Indraneel Pise

Stemming is also an important step in text mining. You could include that too.

Rohit

remember to deal with character encodings.

Rohit Shetty

Thou r awwssomee

Reading list

Quick Guide: Steps To Perform Text Data Cleaning in Python

Introduction

Download the PDF Version of this infographic and refer the python codes to perform Text Mining and follow your ‘Next Steps…’ -> Download Here

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.

Login to continue reading and enjoy expert-curated content.