Analytics Vidhya — October 6, 2015
Beginner Business Analytics Cheatsheet Data Exploration Infographic Infographics R

Introduction

If you wish to build an impeccable predictive model, trust me, neither any programming language nor any machine learning algorithm can award it to you unless you perform data exploration.

Just like a baby learns to walk before running, every data scientist should learn to explore data prior to getting accustomed to algorithms. Data Exploration has paramount importance in predictive modeling.

Data Exploration not only uncovers the hidden trends and insights, but also allows you to take the first steps towards building a highly accurate model. Considering the popularity of R Programming and its fervid use in data science, I’ve created a cheat sheet of data exploration stages in R. This cheat sheet is highly recommended for beginners who can perform data exploration faster using these handy codes. All you need to do is, customize the codes according your need.

Note: This Cheat Sheet is also available for Download in PDF version below.

data mining, data exploration, data science in R

Download Here

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

About the Author

Analytics Vidhya

This is the official account of the Analytics Vidhya team.

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

8 thoughts on "Cheatsheet – 11 Steps for Data Exploration in R (with codes)"

Inderdeep
Inderdeep says: October 06, 2015 at 6:03 am
table() doesn't serve the purpose for continuous random variables, hence of limited use!!! Reply
gokul
gokul says: October 06, 2015 at 11:36 am
Awesome information. Reply
The.R.Enthusiast
The.R.Enthusiast says: October 06, 2015 at 2:56 pm
In "how to generate frequency tables", there is no need to subset the iris data set with "iris$..." if you attached it. Reply
Nhan Vu
Nhan Vu says: October 10, 2015 at 8:43 am
The commands to sort a data.frame should be as following if var1 and var2 are its columns newdata <- old[with(old, order(var1)),] newdata2 <- old[with(old, order(var1, -var2)),] Reply
Kevin
Kevin says: October 23, 2015 at 9:42 pm
I have registered for this site and want to download the data exploration guide but my login has not been approved yet - very frustrating. Reply
Analytics Vidhya Content Team
Analytics Vidhya Content Team says: October 24, 2015 at 4:43 am
Hi Kevin, Sincere apologies for the inconvenience caused. Your login has been approved. Best, Manish Reply
TheDataScientist
TheDataScientist says: October 24, 2015 at 2:28 pm
Thanks, Manish. This is especially helpful for me, as I need a good guide for handling my data processing flow. Keep up the good work! Reply
Arindam
Arindam says: January 04, 2016 at 9:10 am
But why is generating distributions from the mean and SD important in Data Prep, when we have to work on actual data. Couldnt get that part, could you pls explain. The rnorm you are using will only generate random samples from given mean and SD but how is that useful? Reply

Leave a Reply Your email address will not be published. Required fields are marked *