10 Amazing Open Source Projects for Machine Learning Enthusiasts

Gaurav Sharma 14 Jun, 2021
6 min read

This article was published as a part of the Data Science Blogathon


Open source refers to something people can modify and share because they are accessible to everyone. You can use the work in new ways, integrate it into a larger project, or find a new work based on the original. Open source promotes the free exchange of ideas within a community to build creative and technological innovations or ideas. So, programmers should consider contributing to open source projects because of the following reasons:

1. It helps you to write cleaner code.

2. You gain a better understanding of technology.

3. Contributing to open source projects helps you gain attention, popularity and can leverage your career.

4. Adding an open-source project to your resume increases its weight.

5. Improves coding skills

6. Improve Software on a User and Business Level.

3 Open-Source Projects You Can Join Right Now! - DEV CommunitySource: Google Images

To start contributing to open source projects there are some prerequisites:

1. Learn a programming language:  Since in open source contribution you need to write code to get involved in the development, you need to learn a programming language. That can be of any choice. It’s easy to learn another language at a later stage depending upon the needs of the project.

2. Get yourself familiar with Version Control Systems: These are the software tools that help in keeping all the changes in one place that are being made to recall them at a later stage if needed. Basically, they keep track of every modification done by you over time in the source code. Some popular Version Control Systems are Git, Mercurial, CVS, etc. Out of all these Git is the most popular and widely used in the industry.

Now we will look at some of the amazing Open Source Projects you can contribute to.

So, let’s get started!

1. Caliban

caliban/README.md at master · google/caliban · GitHub Open Source Projects

Source: Google Images

This is a machine learning project from tech giant Google.  It is used for developing machine learning research workflows and notebooks in an isolated and reproducible computing environment. It solves a big problem. When developers are building data science projects, it is many times difficult to build a test environment that can show your project in a real-life situation. It is not possible to predict all edge cases. So, Caliban is a potential solution for this problem. Caliban makes it easy to develop any ML models locally, run code on your machine then try out that exact same code in a Cloud environment for execution on big machines. So, Dockerized research workflows are made easy, locally as well as in the cloud.

Github Link: https://github.com/google/caliban

2. Kornia

How a research scientist built Kornia: an open source differentiable library for PyTorch | by PyTorch | PyTorch | Medium | Open Source Projects

                                                               Source: Google Images

Kornia is a computer vision library for PyTorch. It is used to solve some generic computer vision problems. Kornia is built on PyTorch and depends on its efficiency and CPU power so that it can compute complex functions.  Kornia is a pack of libraries used to train neural network models and perform image transformation, image filtering, edge detection, epipolar geometry, depth estimation, etc.

Github Link: https://github.com/kornia/kornia

3. Analytics Zoo

Analytics Zoo

                                                                 Source: Google Images

Analytics Zoo is a unified data analytics and AI platform that unites TensorFlow, Keras, PyTorch, Spark, Flink, and Ray programs into an integrated pipeline. This can efficiently scale from a laptop to a large cluster to process the production of big data. This project is maintained by Intel-analytics.

Analytics Zoo helps an AI solution in the following ways:

  • Helps you easily prototype AI models.
  • Scaling is efficiently managed.
  • Helps to add automation processes to your ML pipeline like feature engineering, model selection, etc.

Github link: https://github.com/intel-analytics/analytics-zoo

4. MLJAR Automated Machine Learning for Humans

                                                           Source: Google Images

Mljar is a platform to create prototype models and deployment services. To find the best model, Mljar searches different algorithms and performs hyper-parameters tuning. It provides interesting quick results by running all computation in the cloud and finally creating ensemble models. Then it builds a report for you from AutoML training. Isn’t this cool?

Mljar efficiently trains models for binary classification, multi-class classification, regression.

It provides two kinds of interfaces:

  • It can run ML models on your web browser
  • Provides Python wrapper over Mljar API.

The report received from Mljar contains the table with information about each model score and the time needed to train every model. Performance is shown as scatter and box plots so it’s easy to check visually which algorithms perform best amongst all. See this:

AutoML leaderboard

                                                             Source: Google Images

Documentation: https://supervised.mljar.com/

Source Code: https://github.com/mljar/mljar-supervised


                                                            Source: Google Images

DeepDetect is a Machine Learning API and server written in C++. If you want to work with the state of art machine learning algorithms and want to integrate them into existing applications DeepDetect is for you. DeepDetect supports a wide variety of tasks like classification, segmentation, regression, object detection, autoencoders. It supports both supervised and unsupervised deep learning of images, time series, text, and some more types of data. But DeepDetect depends on external machine learning libraries like:

  • Deep Learning libraries: Tensorflow, Caffe2, Torch.
  • Gradient Boosting Library: XGBoost.
  • Clustering with T-SNE.

Github link: https://github.com/jolibrain/deepdetect

6. Dopamine

Dopamine: A Research Framework for Deep Reinforcement Learning – Cryofrog

                                                               Source: Google Images

Dopamine is an open-source project from tech giant Google. It’s written in Python. It is a research framework for fast prototyping reinforcement learning algorithms.

Dopamine’s design principles are:

  • Easy Experiment: Dopamine makes it easy for new users to run experiments.
  • It is compact and reliable.
  • It also facilitates reproducibility in results.
  • It is flexible hence makes it easy for new users to try out new research ideas.

Note: Check these Colaboratory Notebooks to learn how to use Dopamine.

Github link: https://github.com/google/dopamine

7. TensorFlow

Bringing Machine Learning to Mobile Applications with TensorFlow

                                                                 Source: Google Images

Tensorflow is the most famous, popular, and one of the best Machine Learning Open Source projects on GitHub. It is an open-source software library for numerical computation using data flow graphs. It has a very easy-to-use python interface and no unwanted interfaces in other languages to build and execute computational graphs. TensorFlow provides stable Python and C++ APIs. Tensorflow has some amazing use cases like:

  • In voice/sound recognition
  • Text Bases Applications
  • Image Recognition
  • Video Detection

…and many more!

GitHub Link: https://github.com/tensorflow/tensorflow

8. PredictionIO

Became committer of Apache PredictionIO | by Naoki Takezoe | Medium| Open Source Projects

                                                                   Source: Google Images

It is built on top of a state-of-the-art open-source stack. This machine learning server is designed for data scientists to create predictive engines for any ML tasks. It’s some amazing features are:

  • It helps to quickly build and deploy an engine as a web service on production templates that are customizable.
  • Once deployed as a web service, respond to dynamic queries in real-time.
  • It supports machine learning and data processing libraries like OpenNLP, Spark MLLib.
  • It also simplifies data infrastructure management

GitHub link: https://github.com/apache/predictionio


Simultaneous feature preprocessing, feature selection, model selection, and hyperparameter tuning in scikit-learn with Pipeline and GridSearchCV | Tomas Beuzen | Open Source Projects

                                                             Source: Google Images

It is a Python-based free software machine learning library of tools. It provides various algorithms for classification, regression, clustering algorithms including random forests, gradient boosting, DBSCAN. This is built upon SciPy that must be pre-installed so that you can use sci-kit learn. It also provides models for:

  • Ensemble methods
  • Feature extraction
  • Parameter tuning
  • Manifold learning
  • Feature selection
  • Dimensionality reduction

Note: To learn scikit-learn follow documentation: https://scikit-learn.org/stable/

GitHub Link: https://github.com/scikit-learn

10. Pylearn2

Pylearn2 is the most prevalent machine learning library among all Python developers.  It is based on Theano. You can use mathematical expressions to write its plugin while Theano takes or optimization and stabilization. It has some awesome features like:

  • A “default training algorithm” to train the model itself
  • Model Estimation Criteria
    • Score Matching
    • Cross-entropy
    • Log-likelihood
  • Dataset pre-processing
    • Contrast normalization
    • ZCA whitening
    • Patch extraction (for implementing convolution-like algorithms)

GitHub Link: https://github.com/lisa-lab/pylearn2

End Notes:

Contributing to open source comes with too many pros. So, these are some good open-source projects to contribute.

Thanks for reading if you reached here 🙂

Let’s connect on LinkedIn.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Gaurav Sharma 14 Jun, 2021

Love Programming, Blog writing and Poetry

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers


yadav sanjay
yadav sanjay 14 Apr, 2022

This is the best blog i have ever seen on the internet all the post are good and helps to providing the knwoledge and teach you new skills keep on posting like this

Julien 03 Aug, 2022

Small typo in : "You can use mathematical expressions to write its plugin while Theano takes or optimization and stabilization"