Abhiraj Suresh — Published On February 19, 2021 and Last Modified On March 12th, 2021
Beginner Datasets Github Interview Questions Listicle Profile Building Project Resource


“I understand the concepts well. Why should I focus on data science projects in my data science journey?”

I have been in the data science industry for more than a year now and this question is one of the most asked ones in the data science journey. This is especially true if are at the beginning stage of your journey. Personally speaking, the existence of this question is plainly immoral.

In the 21st century, there is not a single domain in the world that does not expect the candidate to have some form of self-practice that portrays his/her interest, understanding, and skill. The same is true for data science.


Data Science projects are the best way to showcase to the world your understanding of the topic. The projects you do are a manifestation of your programming skills, knowledge acquired and structured thinking. And let me tell you a little secret- “The data science projects you do serve as the key to unlock the tricky door, called the interview.”

With the importance of data science piquing more than ever, we bring to you 6 open source data science projects published last month that can give your portfolio an edge over the others.

The best way to make the most of your data science journey is to choose the right course, having the right kind of mentorship, and industry-relevant projects to make you industry-ready. Check-out our well-curated Certified AI & ML BlackBelt Plus Program.


Open Source Data Science Projects to Enhance your Portfolio

Let us divide the projects into categories.

Open Source Computer Vision Projects


FaceX-Zoo has to be one of the most impressive projects of the month. With face recognition becoming more and more relevant in the realm of computer vision FaceX-Zoo is an open-source data science project you do not want to miss.

FaceX-Zoo is a face recognition PyTorch toolbox. It comes with a training module having different supervisory heads and backbones towards state-of-the-art face recognition. It has a standardized evaluation module, enabling the evaluation of models in most of the popular benchmarks just by editing a simple configuration.

Also, a simple yet fully functional face SDK is provided for the validation and primary application of the trained models. Also, FaceX-Zoo easily upgrades and extends along with the development of face-related domains.

Open Source Data Science Projects FaceX-Zoo


Bottleneck Transformer – Pytorch

Another mind-blowing project in computer vision, Bottleneck Transformer looks like a very good project to add to your data science portfolio.

The paper says-

“It is simple yet powerful backbone architecture that incorporates self-attention for multiple computer vision tasks including image classification, object detection, and instance segmentation”

Baseline models see significant improvement by simply replacing the last 3 bottleneck blocks of a ResNet and no other changes. Sounds promising, doesn’t it?

The Bottleneck transformer has all the potential to serve as a strong baseline for future research in self-attention models for vision.

Open Source Data Science Projects bottleneck


StyleGAN2-ADA — Official PyTorch implementation

Teaser image StyleGAN2-ADA

When generative adversarial networks are trained using too small data, it may end up in discriminator overfitting, causing training to diverge. This project comes with a solution by including an adaptive discriminator augmentation mechanism that can stabilize training in limited data regimes.

The project come with a lot of promises including-

  • Full support for all primary training configurations.
  • Extensive verification of image quality, training curves, and quality metrics against the TensorFlow version.
  • Results are expected to match in all cases, excluding the effects of pseudo-random numbers and floating-point arithmetic.

With increased speed and efficiency as compared to other projects, StyleGAN2-ADA is a nice open-sourced project to add to your portfolio.


Open Source Natural Language Processing Projects



The fascinating world of NLP is not far behind when it comes to impressive open-sourced data science projects. Trankit is another popular project released last month.

Trankit is a light-weight transformer-based python toolkit for multilingual Natural Language Processing. Its 2 main constituents include-

Another impressive thing about Trankit is that it beats the current state-of-the-art multilingual toolkit Stanza (StanfordNLP) in many tasks over 90 Universal Dependencies v2.5 treebanks of 56 different languages without losing efficiency in memory usage and speed, making it usable amongst a larger audience.


EasyNMT – Easy to use, state-of-the-art Neural Machine Translation

With Easy installation, usage, and Automatic download of pre-trained machine translation models, EasyMNT will easily make your NLP portfolio stand out.

It has translation between 150+ languages and automatic language detection for 170+ languages along with sentence and document translation.


At present, the project provides the following models-


Open Source Machine Learning Project


SeaLion is a brilliant Machine Learning Project created to teach the concepts in a more easy manner using concise algorithms capable of doing the tasks efficiently.

Open Source Data Science Projects sea lion

SeaLion is designed to teach today’s aspiring ml-engineers the popular machine learning concepts of today in a way that gives both intuition and ways of application.

It is beginner-friendly when it comes to solving the standard libraries like iris, breast cancer, swiss roll, the moons dataset, MNIST, etc. The algorithms in SeaLion include-

  1. Deep Neural Networks
  2. Regression
  3. Dimensionality Reduction
  4. Unsupervised Clustering
  5. Naive Bayes
  6. Trees
  7. Ensemble Learning
  8. Nearest Neighbors
  9. Utils


End Notes

Wow– that’s a lot of projects. My aim, as always, was to keep the projects as diverse as possible so you can pick the ones that fit into your data science journey. If you’re just beginning, I would suggest starting with the SeaLion project. A great chance to get a head start.

I would love to hear your thoughts on which open source project you found the most useful. Or let me know if you want me to feature any other data science projects here or in next month’s edition.

About the Author

Abhiraj Suresh

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

One thought on "6 Open Source Data Science Projects That Provide an Edge to Your Portfolio"

Peter says: February 20, 2021 at 5:36 am
Great article and so well written. Reply

Leave a Reply Your email address will not be published. Required fields are marked *