MLflow – An Open Source Machine Learning Platform that works with any Library, Algorithm and Tool!

Pranav Dar 10 May, 2019

3 min read

Overview

MLflow is an open source machine learning platform that aims to unify ML and AI workflows
It is designed to work with any ML library, algorithm, language and deployment tool
The tool is currently in alpha and has 3 components – tracking, projects and models

Introduction

Anyone involved in the data science development process knows how difficult it can be to get your model into production. It’s all well and good to have achieved a benchmark solution but if you can’t get your code into production, it essentially becomes meaningless. There are multiple challenges in machine learning development.

Databricks, founded by the creators of Apache Spark, have released a unified solution to all machine learning framework challenges – MLflow. It is an open source machine learning platform that manages the entire ML lifecycle (from start to production) and is designed to work with any ML library.

In a blog post announcing the release of MLflow, Databricks have listed down the reasons why they decided to develop this tool. They have seen multiple issues with how companies struggle to manage ML workflows. From data preparation to training the model, data scientists prefer using a myriad of tools to validate how good their system is. This requires productioning a lot of libraries, something that is beyond most organizations. Also, reproducing steps of a workflow is critical but can often by difficult to do without detailed tracking. And of course, getting the model into production is the hardest part. There are potentially multiple tools and environments for deploying and there is no standard way to move models from any library to any of these tools.

MLflow can work with any ML library, algorithm, deployment tool or language. Other advantages it offers are:

Designed to work with any cloud
MLflow is integrated with a number of open-source machine learning frameworks, including Apache Spark, TensorFlow, and SciKit-Learn
Scales to big data with Apache Spark

If you have existing code, MLflow can be used with that as well! Since it is open source, you can even share your framework and models across organizations (assuming you also want to open source your code, obviously).

The current version of MLflow has three components:

Tracking: For querying and recording data on experiments
Projects: Provides a simplp format for reproducing code
Models: For managing and deploying models into production

The team is working on adding more components like monitoring the progress of your model. You can install MLflow right now using pip:

pip install mlflow

The project is currently in alpha but the developers feel it’s already good enough to be integrated into an organisation’s current environment. You can check out and follow their repository on GitHub here.

Our take on this

The likes of Facebook, Google and Uber have their own internal framework for machine learning workflows, but even these platforms are limited in their own way. Most of them support only built-in algorithms and are tied to the infrastructure in place at each organization. Not the most flexible way to work.

Some of the alternatives to MLflow you can check out are Sagemaker, Sacred and FGLab. I feel MLflow has better options than these but you are free to make up your own mind!

I like the concept and am looking forward to them adding the aforementioned components like monitoring the progress of your models. This is another example of the ML community giving back to everyone by making such a breakthrough tool open source. If you try it out, do let us know in the comments below!

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

Pranav Dar 10 May, 2019

Senior Editor at Analytics Vidhya. Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

AVbytes