Every career today needs to have a community, a group of people with whom we can talk about work, errors, ideas, and learn. Kaggle is the most popular and world’s largest data science community. Having such a community helps us feel “we belong” which is one of the crucial feelings for our social interaction and health.
In this article, we will be looking at Kaggle as a whole community and Kaggle as a Platform: all its different tools, services, and resources available for us to learn as well as practice data science.
Let’s look at the interface that we get when we visit Kaggle for the first time.
Before we start using Kaggle, we need to create an account and then sign in, you can see both the options on the top right corner. Once you are done with that, this is what It might look like.
Some of the things visible here might be different for you because the interface is personalized with how I have used Kaggle till now from the time I registered.
Navbar and all the things available to us on Kaggle:
Once I click on ‘more’, that’s all the things I can access from my Kaggle account.
In my opinion, there are 4 major things that make Kaggle “THE BEST”.
1. Free Courses and free certificates available
There are so many courses available on multiple domains of machine learning and Data Science. Not only the courses are available, after each lesson, but there are also practice (exercise) notebooks available to get hands-on on the topic. For your free Kaggle certificate, completion of all the tasks and exercises is necessary.
Few more courses are there, but through this, I wanted to show you that there exists such diversity of topics in these courses that you don’t have to go anywhere, anytime to feel lost on some topic or problem get help from here.
Let me show you how, how these courses look like from one example:
At the end of each course, there is one additional lesson, which is different in terms of content but similar to the use case and understanding of the course. Mostly they include some famous or/and powerful topic. Here we have, AutoML (by Google) to automate Machine learning.
2. A Huge collection of publicly available/ contributed datasets to practice/ work on
For any data science or Machine Learning or Deep learning tasks, we need data, and a whole lot of it most of the time. Instead of browsing on different sites for different kind/ size of the dataset, Kaggle provides a common place for a huge collection of all these datasets. You can one click away from using them. They are extremely easy to use.
Once you click on “Datasets” on the Navbar, this is what you will see. You can search for some specific dataset, import/ contribute your own dataset into the community or study or start working on a dataset, shown on this page. (Trending Datasets, Popular Datasets, Recently Viewed Datasets)
For demonstration, I will be searching for a specific dataset (“Sunspots dataset”). Let us see how it looks like.
The number in red selection is the number of upvote people gave it, for the best relevant/liked option. Let’s explore and look at this dataset in detail.
There are a lot of things that we can use to know more about this data and immediately start working.
- You can download the dataset,
- create a new Kaggle Notebook with this dataset already loaded in it.
- Few certain details about columns inside the data.
- Activities that involve this data.
- Last but not the least, all the notebooks created and publicly shared till this date that uses this data.
3. Data Science/ Machine Learning / Deep learning Competitions
Although I haven’t participated in any of them, I love how we complete in real-time on a problem along with the Kaggle community and win amazing cash prizes (if involved in that particular competition). I definitely wanna participate someday soon, I hope the images motivate you. It’s not necessary that only big companies or wealthy enterprises can do that. You can do that as well. There are certain protocols that need to be followed and voila, you have your own hosted competition.
I have sorted the completed competition till today on their reward value. Look closely.
4. Kaggle Notebooks (Code)
For any Data Science or Computer Science related task, we have to write at least some code. Kaggle provides us with its own Notebook environment with a certain limit to how much we can store on them (collectively per account), how many hours of GPU available, and How many hours of TPU available. They are completely integrated with all Kaggle’s services and can be used independently like any other notebook environment (Datalore, Google Colab, Jupyter, etc), which means, you can use them for your practice, Kaggle competitions, Kaggle courses, analyzing some Kaggle/ or non-Kaggle dataset and many more. You must check them out.
Clicking on that black button you create your notebook or open someone else’s notebook that you wanna read and learn from/ compare from. All these visible notebooks are explicitly publicly shared which means your notebooks won’t be visible to anyone unless you chose to do so.
To switch from CPU to GPU or TPU, follow this:
These are most of the functional options that are available to you regarding this notebook.:
Let us see how to use them with data (imported/ taken directly from Kaggle/ downloaded from URL etc) and start working on your data science tasks.
Here I will show you how to use that “Sunspots” dataset that we saw earlier. Start by searching.
Now the data is successfully loaded. The selection on the image above is the directory in which it is stored. Let us see a small pandas code for how to import the dataset.
The last thing that you can do, after you complete your project/ work is to share it with the community on Kaggle. This is an important step because through sharing our ideas, our work we expand the utilities available to the community and support each other. We grow because of each other.
Left to the Big blue button in the top right, you will see a “share” button. Click on that and select Public from the dropdown menu.
I hope you liked what you saw in this guide, and feel eager to start using Kaggle.
B.Tech Computer Science 3rd year
Specialized in Data Science and Deep Learning
Data Scientist Intern at Upswing Cognitive Hospitality Solutions
For more info check out my Github Home Page
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.You can also read this article on our Mobile APP