Session

Model Experiments Tracking and Registration using MLflow on Databricks

Machine learning models are only as good as the quality of data and the size of datasets used to train the models. Data has shown that data scientists spend around 80% of their time on preparing and managing data for analysis and 57% of the data scientists regard cleaning and organizing data as the least enjoyable part of their work. This further validates the idea of MLOps and the need for collaboration between data scientists and data engineers.

During the crucial phase of data acquisition and preparation, data scientists identify what types of (trusted) datasets are needed to train models and work closely with data engineers to acquire data from viable data sources.

Another important aspect of the ML lifecycle is experimentation–where data scientists take sufficient subsets of (trusted) datasets and create several models in a rapid, iterative manner. And without proper industry standards, data scientists have to rely on manual tracking of models, inputs, hyperparameters, outputs and any other such artifacts throughout the model experimentation and development process.

In this talk, you learn how to automate these crucial tasks using StreamSets and MLflow on Databricks.

Dash D

Director of Platform and Technical Evangelism

San Francisco, California, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top