6 Open Source Data Science Projects to Try at Home!
- Work on your data science skills using these open source projects
- These open-source data science projects cover a broad range of topics, from computer vision to web analytics
Have you found learning at home difficult? Most of us are in the same boat – there are too many things to juggle during these tumultuous times and learning has, contrary to our initial expectations, taken a back seat.
So how can we get back on track? How can we combine our data science learning with practical experience?
One key thing that has helped me immensely is picking an open-source data science project and running with it. This not only helps me understand the key areas I need to improve on but also shows me the way forward.
And these projects aren’t your run-of-the-mill data science projects. These are specific projects that tackle a certain data science sub-field, such as computer vision, web analytics, and so on. The project could be a dataset, a state-of-the-art library that has brought the data science field forward, or even an open-source analytics tool.
So, pick a project that intrigues you and start working on it today!
You can check out our entire archive of open source data science projects here.
6 Open-Source Data Science Projects to Try During this Lockdown Period
Open Source Computer Vision Projects
Thanks to the power of PyTorch, we’re seeing a slew of awesome use cases in the computer vision space this year. Here, I have picked out a few outstanding computer vision projects you’ll love exploring and diving into.
And if you’re new to this field and are looking to get started, then check out these resources:
This is an exquisite use case of computer vision. Converting an image into a 3-dimensional photo required sophisticated and in-depth knowledge of tools such as Photoshop at one point in time. Now, thanks to the advances in deep learning and computer vision, we can perform this transformation in just a few lines of code!
This project, open-sourced on GitHub, does exactly that. It takes a single RGB-D input image and converts it into a 3D photo. If you prefer deep learning terms, then this is “a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view”.
Check out an example of what you can do using this framework:
- A Beginner-Friendly Guide to PyTorch and How it Works from Scratch
- Introduction to PyTorch for Deep Learning (Free Course)
This is a sweet side project to work on if you don’t have a lot of time on your hands. It does what it says on the box – you give the model an input image, and it’ll transform that into a cartoon version:
Can you take a guess as to what computer vision concept is behind this project? Yes – Generative Adversarial Networks (GANs). I am truly amazed at the rapid advancements we’ve seen in GANs since it was open-sourced in 2014 to the community. From CycleGANs to StarGANs, there’s no shortage of frameworks you can pick up and work on.
The developers behind this photo-to-cartoon project have open-sourced a pretrained model to help you quickly load and execute this on your machine. I have seen a few attempts at this before but this is the most realistic transformation I’ve come across.
Here are a few resources to help you understand GANs:
Object detection frameworks have seen remarkable progress in recent years. We have gone from generating simple bounding boxes on static images to tracking dynamic objects in videos. That’s the power of computer vision.
However, progress in uniting the concepts of object detection and re-identification has been slow (to say the least!). In this fascinating study, the researchers present a simple baseline to address this gap using one-shot multi-object tracking.
Check out their model in action:
The baseline model they have open-sourced outperforms the state-of-the-art on public datasets at 30 fps. You can find both the code and research paper on the link I have mentioned above.
I recommend going through the below tutorials if you’re looking to learn object detection:
- A Step-by-Step Guide to Core Object Detection Algorithms
- All Analytics Vidhya’s Tutorials on Object Detection
Other Awesome Open Source Data Science Projects
I have curated a list of miscellaneous open source data science projects here, from audio generation to sports analytics. Have a crack at your favorite and enjoy the learning experience!
I clicked on this project as soon as I saw OpenAI in the headline. I’m a big fan of their work, and I appreciate their stance on open-sourcing the major developments to the general data science community. Who doesn’t love GPT-2?
Jukebox, as music fans will intuitively understand, is a neural network model that generates music with singing in the raw audio domain. OpenAI has open-sourced the model weights and code, along with a tool to explore the generated samples.
Here’s how Jukebox works – we provide the genre, artist, and lyrics as input, and the neural network gives us a new music sample produced from scratch. The range of music Jukebox can generate is staggering in its scope. This is a fascinating project to work on!
You can see (and hear) Jukebox in action on OpenAI’s site. And you can also check out Analytics Vidhya’s articles on working with audio data:
- Getting Started with Audio Data Analysis using Deep Learning
- Generate your own Music using Deep Learning
Do you use web analytics tools like Google Analytics to track your site’s performance? The issue with these tools is that there is no privacy for your organization. Additionally, you might need to fork out some money if you want the premium features. Not ideal for everyone, then.
These are the key gaps ShyNet aims to bridge. Here’s how the developers put it:
“You host it yourself, so the data is yours. It works without cookies, so you don’t need any intrusive cookie notices. It collects just enough data to be useful, but not enough to be creepy. It’s open source and intended to be self-hosted. And you may even find the interface easy to use.”
Here’s a sample screenshot of ShyNet’s default homepage:
And if you’re wondering what key metrics ShyNet can give you, your wait is over:
- Page load time
- Bounce rate
- Operating system
- Geographic location & network
- Device type
Keep in mind that ShyNet in its current format is great if you have a small or medium-sized business. It might not be ideal to use if you’re in a big firm. The GitHub repository I have linked above contains a comprehensive run-through of how ShyNet works and how you can start using it.
I recommend going through the below in-depth guide to learn about the world of digital marketing (of which web analytics is a part):
This is a personal favorite. I’m a huge football fan and have been delving into the world of sports analytics for quite some time now. Progress in this field has been far slower as compared to other industries but in the last couple of years, teams and franchises are waking up to the power of analytics and data science.
American sports are way ahead of other countries in terms of progress and adaptability but European football clubs are starting to finally play ball. Liverpool, for example, relies heavily on a data-driven approach from top-to-bottom, including planning their recruitment strategy.
So, if you’re a sports fan and want to dabble into the world of analytics, this is the perfect open source project for you.
The GitHub repository contains a plethora of resources to get you started, including:
- Resources and suggestions for technical skills worth having for work in football analytics
- A collection of Python tutorials that showcase how to work with football datasets
- Research papers and articles about state-of-the-art developments in football analytics
So, which open-source data science project will you work on in May? I tried to cover a broad range of domains here that offer a good depth of choices for you. I’m personally very excited to dive into the football analytics handbook project and see how I can further my knowledge of the subject.
If you have any other open-source projects to share with us, feel free to drop the name and link in the comments section below. Let’s make this a super productive learning month!