MLflow is a tool for managing the training and deployment of machine learning models.
At OSC, MLflow is available to help researchers and developers efficiently track training runs and manage models when working. This guide explains how to access MLflow at OSC, run example notebooks, and visualize your experiment data using the MLflow UI. MLflow is available on OSC clusters as part of the PyTorch module or can be installed to your virtual environment via package managers such as pip, conda, or uv.
We provide a repository with marimo notebooks demonstrating how to integrate MLflow into your training and inference codes on UCR.
To run them at OSC:
Working Directory or Notebook, specify the path to one of the notebooks in the repo.Sandbox environment checkbox.Running the code in the notebooks will create an mlruns/ subdirectory in your local copy of the repository, which contains all of the logged training run data and any registered models. As described in the notebooks, this tracking data can be accessed via Python API. It is also possible to use the MLflow UI, which is available via the MLflow OnDemand app, to graphically view the data collected while executing the notebook. To view the data generated by these notebooks, set the Tracking URI directory to your local copy of the respository.
For more information about how to use MLflow read their documentation.
Note that MLflow offers several options for deploying MLflow servers as described in the MLflow docs. No servers have been deployed at OSC, but if this is necessary for your research please submit a ticket.