MLOps for Mission-Critical Applications: Lessons from Building and Scaling Production ML Systems

Building production machine learning systems is hard. Even more challenging is building a scalable infrastructure for the iterative development, deployment and management of machine learning models. A simple ML system typically consists of a model or two, a dev pipeline with all the data transformations, a deployment pipeline, and a VM or server for hosting and running inference. Such a basic system is usually not designed for scale but for a proof of concept or MVP to prove a business case.

Once a business case is established and there's executive buy-in, it becomes crucial to build and operationalize a more efficient and scalable ML system. This is especially important for mission-critical applications, such as personalisation (recommendation, ad targeting) or risk and compliance (verification, fraud detection), where system fidelity and fault tolerance are critical for product success. Building such ML systems for scale requires an end-to-end process called Machine Learning Operations (MLOps).

In this session, we will explore the key components of an MLOps infrastructure, from tooling to best practices. We will begin by examining how MLOps shares lineage with DevOps but is fundamentally different. Then we will discuss why ML system design is a crucial first step in successful MLOps implementation. We will cover key elements of a ML system, including development, experiment tracking, model registry, feature store, orchestration, versioning, and deployment.

We will also talk about the 20% MLOps that drives scalability for the 80% ML System. This includes deployment strategies, online and offline evaluation, online experimentation, observability, and continuous retraining. We will discuss practical challenges of implementing these different MLOps components, motivated by lessons from building ML systems for high-risk applications such as consumer lending, compliance, and fraud detection. Emphasis will be laid on the importance of long-term execution in MLOps, avoiding the pitfalls of short-term solutions, and setting milestones to achieve visible short-term wins while scaling sustainably.

This session is targeted at data scientists, data engineers, analytics engineers, ML(Ops) engineers, backend/DevOps engineers, engineering managers and data leaders. The only prerequisite for attending is a basic understanding of machine learning. By the end of this session, participants will be better equipped to approach model development and deployment through the lens of ML systems. They will gain intuitive understanding of ML system design and implementation and the fundamentals of setting up and executing MLOps roadmaps for mission-critical applications.

Zion Pibowei

Head of Data Science & AI, Periculum

Lagos, Nigeria

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

MLOps for Mission-Critical Applications: Lessons from Building and Scaling Production ML Systems

Zion Pibowei

Links

Actions