How to use MLflow to Track and Structure Machine Learning Projects?

What is Experiment Tracking, and Why is it important?

Experiment tracking is the technique of keeping track of relevant information about various experiments undertaken while creating a machine learning model. For Example :

  • Hyperparameters
  • Configuration files for the environment
  • Data versions used for training and evaluation
  • Performance visualizations and a lot more

What is MLflow?

MLflow is an open-source tool used in machine learning to help developers and data scientists better understand and interact with their data. It allows you to manage the entire machine learning lifecycle — experimentation, reproducibility, deployment, and model registry. Let’s look at some of the features MLflow offers before moving on to the key components.

  • It packs an ML model in a common format that downstream programs may utilize.
  • It’s a model store with APIs, and a user interface for managing the MLops Lifecycle.
Graphic shows Key Components of MLflow (Source: medium.com/pytorch)

MLflow Tracking

When executing your machine learning code, the MLflow Tracking component offers an API and UI for recording parameters, code versions, metrics, and output files, as well as viewing the results later. MLflow Tracking uses Python, REST API to log and query trials.

MLflow.start_run() -- starts/executes a run.
MLflow.end_run() -- ends a currently active run.
MLflow.log_artifacts() -- logs all the files given in a directory as artifacts.
....

MLflow Projects

An MLflow Project is a convention-based framework for packaging data science code in a reusable and repeatable workflow. The Projects component also offers an API and command-line utilities for executing projects, allowing you to create workflows by chaining projects together.

The image represents how MLflow Projects works: Image Source: infoq.com

MLflow Models

A machine learning model is packaged as an MLflow Model, which may be utilised in several downstream tools, such as real-time serving over a REST API or batch inference on Apache Spark. The format establishes a standard that allows you to store a model in many “flavors”. MLflow makes it easy to package models from various popular machine learning libraries in MLflow Model format, with tons of customization options.

Model Registry

The MLflow Model Registry component provides centralized model storage, API set, and UI for jointly managing an MLflow Model lifecycle. It includes model lineage, versioning, and annotations.

Benefits Of Using MLflow

Let’s take a look at some of MLflow’s benefits.

  • Supports many Tools and Frameworks
  • Highly Customizable
  • It’s ideal for data science projects.
  • Focuses on the entire Machine learning lifecycle.
  • Works with any ML library.
  • Custom Visualization

Tracking ML Experiments using MLflow

We will discuss the basic integration process of MLflow in your machine learning application/project. Let’s have a look at how you can use the MLflow UI to visualize your data.

UI Workflow

Installing MLflow :

pip install MLflow
import MLflow
import MLflow.sklearn
MLflow.set_experiment(experiment_name="MLflow demo")
MLflow.log_metric("accuracy", model_accuracy) //metric logging
MLflow.log_metric("precision", precision) //metric logging
MLflow.sklearn.log_model(model, "model") //model loggingMLflow.log_param("max_depth", max_depth) //hyperparameters logging
...
MLflow ui
The image shows MLflow UI

API Workflow

MLflow provides a more detailed Tracking Service API for tracking experiments and runs directly, which is accessible via the MLflow.tracking module’s client SDK. This allows you to search for data from previous runs, log extra information about them, create experiments, tag runs, and more.

from MLflowf.tracking import MLflowClient
client = MLflowClient()
experiments = client.list_experiments() # returns a list of MLflow.entities.Experiment
run = client.create_run(experiments[0].experiment_id) # returns MLflow.entities.Run
client.log_param(run.info.run_id, "hello", "world")
client.set_terminated(run.info.run_id)

Some Highlights of MLflow

The MLflow API is well-designed, and new features are released regularly. It’s important to keep up with new features and updates by monitoring the API. However, I’d like to draw attention to a few noteworthy characteristics of MLflow.

  • A number of task orchestration platforms are available, but MLflow is designed particularly to enhance the machine learning lifecycle. This means that MLflow can conduct experiments and track their outcomes, as well as train and deploy machine learning models.
  • Deep learning models benefit from auto-logging. As we all know, during the training of a Deep Learning model, multiple parameters/hyper-parameters are captured.
  • With MLflow, you can customize it to meet your specific requirements. It can also handle massive volumes of data
  • MLflow API supports not just Python but also Java and R programming languages.
  • It is open-source, so you can get good community support.
  • It may be used to deploy various machine learning models, which can be saved as a directory with any number of files in it.
  • With MLflow, data scientists will no longer need to manually monitor the parameters they use in each run.

Conclusion

We’ve seen MLflow’s potential and learned how it can help you with experiment tracking and monitoring. We also discussed what MLflow is and how it can help you in your machine learning lifecycle. MLflow can provide a strong method for tracking model, packaging, and repeatability with only a few lines of code. In the machine learning arsenal, this is a must-have tool.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Harshil Patel

Harshil Patel

Software Developer and Technical Writer.