A complete Data science guide for people beginning their Data Science Journey
Hello there, this is Sion. I am writing this article because I know how stressful and confused a person can get while starting a new career or switching from an old one. Most of us have been there, that phase in our life when nothing makes sense, and even if we give it our 100%, it doesn’t seem to be enough. But the thing you have to remember is that, no matter how things seem right now, it’s not permanent, nothing is permanent, change is the only constant. There is this famous quote by Winston Churchill which helped me a lot while I had my share of gloomy days, it goes something like…
| ” If you’re going through hell, keep going ” |
Which kind of makes sense when you think about it, I mean why would you stop in hell? Anyways now let us get started with Data Science.
Data scientist was dubbed as the sexiest job of the 21st century by Harvard Business Review in the year 2012 and still holds the title for 9 years in a row. The purpose of a Data scientist is to “create value from raw data through analytics”. Today Data scientists are recruited by all the top tier companies all around the globe to get elbow deep into their data and draw valuable insights and conclusions which will help the company progress further into the future and crushing their competitions.
| ” A Data scientist can predict the future with the necessary Data ” |
1. Tools and skills for people Beginning Data Science Journey
To work their magic, a Data scientist needs their tools which helps them bend the data to their will and provide valuable stories and secrets the data has to say. Mastering these tools is easy if you put your mind to them, but it requires consistency and perseverance. There are different stages of a Data Science project, and each stage requires different types of tools. Though there are many of these tools that are out there on the internet which you can learn, the ones which I mention in this article are the ones I personally found easy to use and helps you to just get started in the world of Data Science.
The stages in a Data Science process are :
- Data Storage
- Exploratory Data analysis
- Data Modelling
- Data Visualization
A. Data Storage
A typical successful company can generate millions if not billions of data every single day. All this data has to be stored somewhere secure and easy to access the place where a Data Scientist can acquire the necessary data whenever they want. This place is called a Database. A database needs to be regularly monitored for smooth flow and storage of data.
SQL is a very handy tool to use this data to your will. SQL stands for ‘Structured Query Language’ and is a programming language that is mainly used to handle relational databases. It is used for database creation, deletion, fetching rows, and modifying rows, etc. Here is a nice Analytics Vidhya article if you want to get a glimpse of SQL.
Image Source: SQL Basics (slideshare.net)
B. Exploratory Data Analysis
EDA is an approach to analyze datasets to summarize their main characteristics. It really helps a data scientist, in the long run, to be familiar with the data they are working on. Gathering initial insights will help you to understand the problem better and ask necessary questions which you can then answer with Data Science. EDA deserves a read of its own, so here is another helpful article from Towards Data Science that will help you understand EDA in much greater depth.
One of the basic skills for a Data scientist of the 21st century is knowing the programming language Python. Even if you have no coding or Computer science background, learning Python is very very easy. It’s very intuitive and you can master it in a matter of weeks and even on your own! Here is a code snippet, of how the basic Hello World! code looks in python:
As easy as that! I learned Python on my own using various free online resources. The best one so far in my opinion is from a website called SoloLearn. It’s totally free and is designed to teach you in a very interactive way. But practice makes perfect. You can apply all that you’ve learned and practice coding on the website HackerRank. HackerRank provides questions of all difficulties and on various topics which will make you feel confident as a programmer and get familiar with how a computer works. I’ll link both the websites below.
C. Data Modeling
The next step is to create or make a predictive equation, or commonly referred to as a model, which will predict what ‘might happen’ or ‘might come in the future using the previous data. This is usually done using Machine Learning, which is one of the advanced topics of Data Science. Anyone can do machine learning at a basic level, but if you want to become a robust and badass Data scientist in the future, I recommend you first learn the math behind machine learning. The topics usually required are Linear Algebra, Multivariate Calculus, and Statistics.
Image Source: A Complete Tour Of Data Science Project Life Cycle (analyticsindiamag.com)
D. Data Visualization
Python provides really nice packages( A package is a 3rd party bundle of scripts which provide much additional functionality. The existence of these packages is one of the reasons why python is soo widely used these days. ) to make basic as well as advanced visualizations, which are really helpful in conveying the story which the data reveals. Once again I’d like to mention that in this article I’m including all the basic knowledge a data science aspirant or beginner should have which will help them not feel lost, and get started. Though there are many far better tools for visualization out there on the internet ( like Tableau.. etc), at a basic level, learning those might make one feel overwhelmed.
Back to python packages for data visualization. The two most popular packages are Matplotlib and Seaborn. Visualizations include Pie charts, Bar graphs, histograms, etc. which provide a pictorial representation of data that helps a non-technical as well as a technical person to understand important insights. You can learn more about them at the following links:
Image Source: Data Literacy on Twitter: “Here are 13 common ways to visualize data, plus a bonus that’s not a visualization. How comfortable are you reading and interpreting these charts? How about creating them from raw data? https://t.co/5qTpkrG9Iq” / Twitter
2. Creating a Data Science Portfolio
To be successful at what you do nowadays, having references and sources is a very important factor. Knowing and learning from people who are already good at what you want to achieve will take you a long way, and in this age of the internet, connecting with new people is as easy as breathing. On the other hand, you have to create an online image of yourself where you showcase your skills and talents, so people know what they’ll be getting in return.
| ” Connections can open doors which a degree can’t ” |
Having a regular Github profile is the easiest way of showing that you are serious with your work. Another great connecting source is LinkedIn. It is a business and employment-oriented social media which helps companies recruit staff or help individuals to showcase their professional achievements and secure employment. I highly recommend every one of you to have an active Github and LinkedIn account and engage in the community regularly. I’ll link my socials at the end of the article in case some of you are interested in correspondence with me.
But arguably the most important online forum for Data Science is Kaggle. It is particularly made for Data Scientists and has a very active and engaging community. Kaggle offers a wide range of datasets of every domain, for you to practice and provides powerful tools and resources all for free, to help you achieve your data science goals. Kaggle also hosts regular contests with a cash prize of a very generous amount for you to participate in. Once you grow your skills and feel confident, you can participate in these free contests and if you are good enough, can win a hefty sum of money!!
3. Some more tips
- Another popular programming language used by Data scientists across the world is R. It is mainly used for statistical computing and graphics. It can be used for Statistics, Data analysis, and machine learning too. Since I myself do not have much experience regarding R, I don’t think I’m qualified enough to tell you about it, but if you are interested, you can always Google it, which brings me to my next tip.
- INTERNET IS YOUR BEST FRIEND. Any programmer or data scientist, be it a novice or a senior at a respectable firm, will tell you the same.
- A tip for learning Python better is to always read the official documentation for the package or library you are going to use.
- Read online blogs and keep yourself updated with the latest advances in your field. Since you are already reading this article, I suggest you read 1 article daily to keep up with the changing technologies. Analytics Vidhya is a very rich source for such awesome blogs and posts.
- Last but not the least, Choose your niche. Data science is a tool that can be used in numerous fields, like Medicine, Finance, Business, Science and Engineering, and every other field you can think of. What you have to decide in which field you are passionate about, find your niche. Going into a field you are passionate about will help you look at the data in a way that no one else can, and it’ll give a sense of fulfillment doing something you love.
In the end, I just want to say if you are beginning your journey in Data Science, I hope this article helps you in getting started, if you are still feeling lost and want to discuss your next step, feel free to connect with me on LinkedIn and I’ll be more than happy to help you out. This was my first article, so I chose a topic that I thought would’ve helped me when I first got started with Data Science. All the best for your journey, and remember, you set your own limits. Cheers!!.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.