Learn everything about Analytics

Five data science projects to learn data science

SHARE
, / 8

Nothing beats the learning which happens on the job!

Whether it is the challenges you face while collecting the data or cleaning it up, you can only appreciate the efforts, once you have undergone the process.

Hence, the best way to learn Data Science is to do Data Science. There is no substitute to it.

It doesn’t matter whether you are using R or Python or Weka – the best approach to learn data science is to learn the basics of the tool you are using (e.g. How is data stored? How can you access specific data points? How to make data manipulations? etc.) and then just start working on a data science problem / project.Five useful projects to learn data science

In order to help you learn data science, I have listed some of the datasets I recommend, along with the reason, why I have included them in the mix. All these datasets are available for free over the internet and provide a glimpse of how data science is changing the world, we live in.

These datasets would appeal to you, irrespective of the fact whether you are a newbie or a pro. Here are 5 datasets and the reasons why I recommend them:

  1. Titanic dataset from Kaggle: This is the first dataset, I recommend to any starter and for a good reason – the problem looks simple at the outset. Yet, it provides a good understanding of what a typical data science project involves. The starters can work on the dataset in excel and the pros can work on advanced tools to extract hidden information and algorithms to substitute some of the missing values in the dataset. Another cool aspect is that you can rank yourself against other data scientists on Kaggle to see where you stand. This dataset is just the introduction you need, before you delve into the world of Kaggle.
  2. Learning to mine twitter on a topic: This project is included in the list, so that beginners can correlate to the power of data science. With help of twitter and a good data science tool, you can find out what the world is saying about a particular topic. I was mesmerized by this, when I did this for the first time. Be it reviews about movies, sentiments about elections or any hot topic off the press – you can know what the people are saying by yourself. Performing this exercise not only helps you understand some of the challenges in mining social media (especially, if you are interested in text mining), it also helps you understand how easy it is to integrate an API in your scripts to access the information available on social media.
  3. Human activity recognition using smartphone dataset: This problem makes into the list because it is a segmentation problem (different to the previous 2 problems) and there are various solutions available on the internet to aid your learning. It is an interesting application, if you have ever wondered how does your smartphone know what you are doing right now. Another reason to solve this problem is that it helps you understand a different kind of problem – one where there are no missing values (because the collection is happening in automated manner), so the focus is on data munging and learning.
  4. Hubway Visualization challenge:  This problem focuses on data visualization and not prediction / machine learning explicitly (No one stops you from applying those though). The questions mentioned in the challenge help understand the challenges a business can solve with help of Business Intelligence tools. Again, there are bunch of interesting visualizations available on the internet to see what some of the best minds have produced.
  5. Movielens data: I couldn’t have left this data set out. Bigger than some of the other data sets mentioned in the article, but provides a lot of fun. The dataset is sufficient to build a recommender system and see which movies are liked by what kind of audience.
    1. Readme file: http://files.grouplens.org/papers/ml-1m-README.txt
    2. Dataset: http://www.grouplens.org/system/files/ml-1m.zip

These are the five datasets, I recommend to people starting in the industry. They provide a healthy mix of different types of challenges you face as a data scientist. Each of these datasets provide a bunch of learning and would probably leave you wanting for more.

If you are aware of other open datasets, which you recommend to people starting their journey on data science, please feel free to suggest them along with the reasons, why they should be included. If the reason is good, I’ll include them in the list.

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

8 Comments

  • Sudhindra says:

    Awesome post…for any starter like me!!

  • Love Shah says:

    Awesome stuff!
    Thanks for sharing.

  • atul says:

    Hi Kunal,

    I am a big fan of your work and I am very grateful to you & Analytics Vidhya for sharing all these useful information.

    Also please start something on recommender system or if you have any music related content dataset to make a content recommending system. It will be great help.

    thanks and regards,
    Atul Rawat

  • Akash says:

    Can someone name the best universities in world that teach the BA program?

  • Jigar says:

    Hi Guys,

    I recently found Analytics Vidhya, and immediately loved your articles, tutorials, and all the effort you guys are putting into educating in the field of analytics.

    Another dataset that I’d recommend is the Fuel Economy dataset from the site below. This is a car lovers dream. This dataset (although a small one) lists 37000 cars/trucks/etc. with their emission rating, mpg value, drive, etc between 1984 to 2016.
    ——————————————————
    http://fueleconomy.gov/feg/download.shtml
    —————————————————–

    A lot of correlation, exploration, visualizations are possible, linking fuel type, with mpg, performance improvement over various car models, emission ratings, etc.

    Check it out !

    Thanks,
    Jigar

  • nagarjuna says:

    is if any open datasets from FACEBOOK, if any pls share ..

  • Catherina says:

    I am a beginner studying data science. For a college term project, roughly how much time will it take to build a mini movies/songs recommender system ?
    Please let me know of any other useful project ideas with the estimated completion time.

    Thanks

Leave A Reply

Your email address will not be published.

Amazing August
Become Better at Data Science