Mohammad Ahmad — July 13, 2021
Advanced Data Science Libraries Machine Learning MLops Python

This article was published as a part of the Data Science Blogathon

Overview

  • In this article, we will learn about MLOps.
  • About MLRun library and its features.
  • The architecture of MLRun framework with examples of each component

Introduction

Like DevOps, MLOps (machine learning operations) is a set of practices that aims to make developing and maintaining production machine learning seamless and efficient. MLOps seeks to increase automation and improve production models’ quality while also focusing on business and regulatory requirements. A common architecture of an MLOps system would include data science platforms where models are constructed and the analytical engines where computations are performed. The MLOps tool orchestrates the movement of machine learning models, data, and outcomes between the systems. Several goals enterprises want to achieve through MLOps systems are Rapid deployment, pipeline automation, feature and log management, Reproducibility of models and predictions, etc.

MLRun

MLRun is an open-sourced MLOps framework that provides seamless and efficient management of your machine learning library from early development to full production deployment.

Key benefits provided by the MLRun framework includes –

  • Rapide development of code from early stage to production.
  • Elastic scaling of batch and real-time workloads.
  • Feature management – preparation and monitoring of logs.
  • Works anywhere —  IDE, multi-cloud, etc.

MLRun is composed of different layers, these convenient abstraction layers provide a lot of features to a wide variety of technology, like automating the build process, execution, data movement, scaling, versioning, parameterization, outputs tracking, and more. In every ML experiment, we preferably want to save our code, config, results, logs, input, outputs, etc, so that we can reproduce them in different development environments, MLRun helps to manage, save, reproduce our experiment without any hassle.

MLRun is composed of the following layers:

  • Feature and Artifact Store — handle the ingestion, processing, metadata, and storage of data and features across multiple repositories and technologies.
  • Elastic Serverless Runtimes — converts simple code to scalable and managed microservices with workload-specific runtime engines (such as Kubernetes jobs, Nuclio, Dask, Spark, and Horovod).
  • ML Pipeline Automation — automates data preparation, model training and testing, deployment of real-time production pipelines, and end-to-end monitoring.
  • Central Management — provides a unified portal for managing the entire MLOps workflow. The portal includes a UI, a CLI, and an SDK, which are accessible from anywhere.

Read about MLRun framework here – Github Repository

The architecture of the MLRun framework

The architecture consists of different basic components, combining these components create a pipeline.

Let’s discuss the main component of MLRun with examples.

To install MLRun on your device, run the following command in your terminal:

pip install mlrun

Let’s discuss some of the main components of MLRun with examples.

1. Project

Project is a container consist of all your source code, metadata, artifacts, logs, models, etc. It helps in organizing all of your activities regarding the ML experiment.

You can define the project name, and then use mlrun.set_environment to set your project name.

from os import path
import mlrun

project_name_base = 'Project_name' # Mention Your Project Name Here

project_name, artifact_path = mlrun.set_environment(project=project_name_base, user_project=True)

print(f'Project name: {project_name}')
Output-
Project name: Project_name

2. Function –

Functions are the small packages that we can write for the execution of the different individual steps of our pipeline. These steps include not limited to fetching data, transforming data, training multiple models, testing, etc. Below is a simple example of a function that fetches data from MongoDB atlas.

Funtion can be created in four different methods,

  • mlrun.new_function
  • mlrun.code_to_function
  • mlrun.import_function
  • mlrun.function_to_module

We define a simple python function, we can store this function in a source file and use mlrun.code_to_function to create a function object.

def fetch_data(context : MLClientCtx, data_path: DataItem):
    context.logger.info('Reading data from {}'.format(data_path))
    m_client = pymongo.MongoClient("Mention The Link of Your MongoDB Client Here")
    db = m_client.test
    m_db = m_client["DB_name"]
    db_cm = m_db["DB_name"]
    df = pd.DataFrame.from_records(db_cm.find())
    suicide_dataset = df
    target_path = path.join(context.artifact_path, 'data')
    context.logger.info('Saving datasets to {} ...'.format(target_path))
    # Store the data sets in your artifacts database
    context.log_dataset('suicide_dataset', df=suicide_dataset, format='csv',
                        index=False, artifact_path=target_path)

3. Run –

  When a function is executed all information is about is stored in an object that is known as the Run object. This run object is created when you run any function it stores all information like function attributes (such as arguments, input, and outputs), results, and logs of the executed function.

We first define the function object, this function object can be used to execute all functions defined in the source code,

func_obj = mlrun.code_to_function(name='f_obj', kind='job', filename = 'Path of the Source code)
fetch_data_run_obj = func_obj.run(handler='fetch_data',inputs={'data_path': 'Mention Path of the DATA CSV'},                                local=True)

We use this object to run our function, in handler we pass the function name, in input, we pass the argument of the function. 

fetch_data_run_obj.outputs

This will give the output of the function, in this case, the fetched dataset.

4. Artifact- design data artifacts (such as data sets, graphs, pickle files, and models) that are produced or used by functions, runs, and workflows. We pass an artifact directory name, this is the directory you want to store your data. The directory structure is given below-

─── Artifact directory
    ├── Data
        ├── data (All your datasets)
        ├── model (saved model and model config)
    ├── artifacts/project_name-username (Contain all your artifact data)
    ├── functions/project_name-username (Contain all your function data)
    ├── runs/project_name-username (Contain all your run object data)

Conclusion

One of the most difficult parts of the Machine learning development phase is the production deployment and their management. MLOps helps to define a set of practices that aim to make developing and maintaining production machine learning seamless and efficient.

There are different frameworks defined for machine learning operations, in this article we learn about one such framework MLRun.

MLRun is an open-source MLOps framework that provides seamless and efficient management of your machine learning library from early development to full production deployment. MLRun has a lot of functionality available you can read about them in detail on their GitHub repository

Follow the official example and tutorials here

I hope you have learned something from this blog, do share it with others. Check out my personal Machine learning blog(https://code-ml.com/) for new and exciting content on different domains of ML and AI.

About the Author

Mohammad Ahmad - Research Engineer
LinkedIn - https://www.linkedin.com/in/mohammad-ahmad-ai/
Personal Blog - https://code-ml.com/
GitHub - https://github.com/ahmadkhan242
Twitter - https://twitter.com/ahmadkhan_242

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

About the Author

Our Top Authors

  • Analytics Vidhya
  • Guest Blog
  • Tavish Srivastava
  • Aishwarya Singh
  • Aniruddha Bhandari
  • Abhishek Sharma
  • Aarshay Jain

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *