Lisa N. Cao
Product Manager at Datastrato
Actions
Lisa is a data engineer and now product manager interested in observability, validation, and reliability in data systems. Through her work at Datastrato she is reinventing new and improved use cases for metadata to be leveraged in AI stacks for DataOps and Data Fabric integrations. Her background consists of a variety of start-ups, nonprofits, consulting firms, GovTech, and biotechnology. She is a Google Women TechMakers Ambassador, Linux Foundation LiFT recipient for Women in Open Source, founder and chair of the Vancouver Datajam, and lead maintainer of the BiocSwirl project.
Area of Expertise
Finding product-market fit as an open source company
Does being an open source company make it easier, harder or just different to find product-market fit? What is the relationship between product-market fit and project-market fit? In this session, we'll go over some of the basics of product for engineering-driven startups and considerations for striving for PMF in the open source space. This session will also include an open discussion and case studies.
Bridging the Gap between DevOps, SecOps, DataOps, and MLOps
As generative AI has quickly gained trajectory, the developer tooling we’ve seen sitting on the IDEs and benches of machine learning engineers have grown more and more niche. Even within the data infrastructure world, there has been tooling and operational silos that are making it harder than ever to facillitate collaboration between other teams. In an era where everything is jumbled into each other, redundancy is at an all time high, and everyone is insistent on platforms, where do we start? In this session we invite the Members’ Summit to discuss the emergence and dominance of machine learning and data intensive applications, and what it means to effectively ‘jump on the wagon’.
Open Source DataOps and MLOps Strategies
Here we will try to demystify data's hardest problems- interoperability, standardization, and vendor lock-in. From pipelines to serving models, this session discusses strategies for the promotion of open source technologies as groups try to implement their own DataOps and MLOps infrastructures.
Maintaining Diverse Maintainers: How to Keep Your Project Inclusive
After maintaining open source projects for 5+ years now with diverse teams, I've learn some key ways to keep your open source project inclusive. Whether it's the platforms you use, communication style, development flexibility, project promotion, or keeping contribution barrier low, there's lots of small strategies that can be used to increase representation and community connection.
How Open Source is Shaping the Data Catalog Landscape & Future Directions
As data catalogs evolve to meet the growing and new demands of high velocity, unstructured data, we see them taking new shape as an emergent and flexible way to activate metadata for multiple uses. This talk discusses modern uses of metadata at the infrastructure level for AI-enablement in RAG pipelines in response to the new demands of the ecosystem, and why open source is becoming first class citizens in a previously proprietary space. We will also be discussing the various open source data catalogs that exist and how they compare to each other.
The Convergence of Streaming and Data Lake Architectures for AI/ML
The exponential growth of data in recent years has accelerated the need for scalable, real-time data processing architectures to support AI and machine learning (ML) workloads. This talk explores the convergence of streaming and data lake architectures to address these challenges. Traditionally, streaming systems like Apache Kafka and data lakes such as Apache Hadoop have been used independently—streaming for real-time data ingestion and lakes for batch processing and long-term storage. However, the integration of these paradigms presents an opportunity to create a unified data architecture capable of supporting the diverse requirements of AI/ML workflows, such as low-latency processing, high throughput, and large-scale storage.
This presentation will discuss how recent advancements in both technologies, such as the development of stream processing frameworks (e.g., Apache Flink) and modern data lakehouses (e.g., Delta Lake), are facilitating seamless data flow between real-time streams and batch processing layers. Key topics will include the benefits of this hybrid approach for AI/ML, architectural patterns, and implementation strategies. The session will also cover use cases where companies have successfully leveraged this convergence to accelerate model training, enhance data governance, and optimize decision-making processes. Attendees will leave with practical insights into designing data platforms that effectively blend the strengths of streaming and data lake architectures for AI and ML applications.
The Quick and Dirty Guide to Metadata
Metadata- what is it? What are it's use cases? In this quick and dirty guide you'll learn about how metadata from various sources can be leveraged to better orchestrate and inform data management and practices, observability, and data governance-- essentials for any data-driven organization looking to scale. We will go over key examples of metadata such as information about your data's form and structure, catalog records, and generally any data about data and how to use it.
To Mesh, or Not to Mesh? How to Know When a Fabric is Good Enough
As big data has taken the world by storm, how we serve and maintain it's infrastructure has grown increasingly complex as well. How do we know what architecture is right for us? As incredible as mesh is, it takes a lot of investment and work to implement. In this lightning talk, we go over some intermediary data architectures that will help platformize your data serving without having to go too far into the deep end.
Metadata Lakes for Next-Gen AI/ML
As data catalogs evolve to meet the growing and new demands of high velocity, unstructured data, we see them taking new shape as an emergent and flexible way to activate metadata for multiple uses. This talk discusses modern uses of metadata at the infrastructure level for AI-enablement in RAG pipelines in response to the new demands of the ecosystem. We will also be discussing Apache (incubating) Gravitino and it's open source-first approach to data cataloging across multicloud and geo-distributed architectures.
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top