© Mapbox, © OpenStreetMap

Speaker

Lisa N. Cao

Lisa N. Cao

Product Manager at Datastrato

Actions

Lisa is a data engineer and now product manager interested in observability, validation, and reliability in data systems. Through her work at Datastrato she is reinventing new and improved use cases for metadata to be leveraged in AI stacks for DataOps and Data Fabric integrations. Her background consists of a variety of start-ups, nonprofits, consulting firms, GovTech, and biotechnology. She is a Google Women TechMakers Ambassador, Linux Foundation LiFT recipient for Women in Open Source, founder and chair of the Vancouver Datajam, and lead maintainer of the BiocSwirl project.

Awards

Area of Expertise

  • Business & Management
  • Information & Communications Technology

Finding product-market fit as an open source company

Does being an open source company make it easier, harder or just different to find product-market fit? What is the relationship between product-market fit and project-market fit? In this session, we'll go over some of the basics of product for engineering-driven startups and considerations for striving for PMF in the open source space. This session will also include an open discussion and case studies.

Open Source DataOps and MLOps Strategies

Here we will try to demystify data's hardest problems- interoperability, standardization, and vendor lock-in. From pipelines to serving models, this session discusses strategies for the promotion of open source technologies as groups try to implement their own DataOps and MLOps infrastructures.

Maintaining Diverse Maintainers: How to Keep Your Project Inclusive

After maintaining open source projects for 5+ years now with diverse teams, I've learn some key ways to keep your open source project inclusive. Whether it's the platforms you use, communication style, development flexibility, project promotion, or keeping contribution barrier low, there's lots of small strategies that can be used to increase representation and community connection.

The Convergence of Streaming and Data Lake Architectures for AI/ML

The exponential growth of data in recent years has accelerated the need for scalable, real-time data processing architectures to support AI and machine learning (ML) workloads. This talk explores the convergence of streaming and data lake architectures to address these challenges. Traditionally, streaming systems like Apache Kafka and data lakes such as Apache Hadoop have been used independently—streaming for real-time data ingestion and lakes for batch processing and long-term storage. However, the integration of these paradigms presents an opportunity to create a unified data architecture capable of supporting the diverse requirements of AI/ML workflows, such as low-latency processing, high throughput, and large-scale storage.

This presentation will discuss how recent advancements in both technologies, such as the development of stream processing frameworks (e.g., Apache Flink) and modern data lakehouses (e.g., Delta Lake), are facilitating seamless data flow between real-time streams and batch processing layers. Key topics will include the benefits of this hybrid approach for AI/ML, architectural patterns, and implementation strategies. The session will also cover use cases where companies have successfully leveraged this convergence to accelerate model training, enhance data governance, and optimize decision-making processes. Attendees will leave with practical insights into designing data platforms that effectively blend the strengths of streaming and data lake architectures for AI and ML applications.

The Quick and Dirty Guide to Metadata

Metadata- what is it? What are it's use cases? In this quick and dirty guide you'll learn about how metadata from various sources can be leveraged to better orchestrate and inform data management and practices, observability, and data governance-- essentials for any data-driven organization looking to scale. We will go over key examples of metadata such as information about your data's form and structure, catalog records, and generally any data about data and how to use it.

To Mesh, or Not to Mesh? How to Know When a Fabric is Good Enough

As big data has taken the world by storm, how we serve and maintain it's infrastructure has grown increasingly complex as well. How do we know what architecture is right for us? As incredible as mesh is, it takes a lot of investment and work to implement. In this lightning talk, we go over some intermediary data architectures that will help platformize your data serving without having to go too far into the deep end.

Metadata Lakes for Next-Gen AI/ML

As data catalogs evolve to meet the growing and new demands of high velocity, unstructured data, we see them taking new shape as an emergent and flexible way to activate metadata for multiple uses. This talk discusses modern uses of metadata at the infrastructure level for AI-enablement in RAG pipelines in response to the new demands of the ecosystem. We will also be discussing Apache (incubating) Gravitino and it's open source-first approach to data cataloging across multicloud and geo-distributed architectures.

Lisa N. Cao

Product Manager at Datastrato

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top