© Mapbox, © OpenStreetMap

Speaker

Lisa N. Cao

Lisa N. Cao

Product Manager at Datastrato

Actions

Lisa is a data engineer, product manager, and speaker in open source data infrastructure and DataOps fields. Through her work at Datastrato, creators of Apache Gravitino, she is redefining the data cataloging space for generative AI use cases and end-to-end data integrations. She currently serves on the Linux Foundation's Outreach committee, leads the Open Platform for Enterprise AI's (OPEA) Developer Experience Working Group, and leads the Continuous Delivery Foundation's (CDF) DataOps Initiative.

Lisa is also a Google Women Techmakers Ambassador, founder and 3x chair of the Vancouver Datajam, and former lead maintainer of the BiocSwirl project. She is also a Terry Fox Gold Medal award recipient (2021) and Linux Foundation LiFT recipient for Women in Open Source (2021). Some meetups she has organized included SF's Data for AI meetup, Data Engineer Things Bay Area, and RLadies Vancouver.

Awards

Area of Expertise

  • Business & Management
  • Information & Communications Technology

Catalogs as Context: Using metadata to power and govern the next wave of AI development

Developing powerful AI tooling has been our theme of the year, with agents and foundational models picking up steam across the board. Therein still lies the question though: how do we serve data for these applications to work effectively? What about at enterprise scale? What even is context? In this talk we discuss the current big data landscape, challenges to data platforming for AI, and why data catalogues and metadata are the only viable path forward to effective, governed AI-development. In this talk we use the open source framework, Apache Gravitino as a key example for why such a solution needs vendor neutrality.

Why Open Source is Key to Future Data and AI Governance

As data and AI systems become increasingly central to enterprise and societal decision-making, governance challenges around transparency, compliance, and trust are more critical than ever. Open source plays a fundamental role in shaping the future of data and AI governance by fostering collaboration, auditability, and interoperability. This has resulted in various emerging open-source projects aiming to provide a unified metadata and governance layer for organizations to manage data assets across diverse platforms while ensuring compliance with evolving regulations. This talk explores how open-source solutions like Apache Gravitino empower enterprises to take control of their data and AI governance strategies, mitigate risks, and drive responsible AI adoption.

Finding product-market fit as an open source company

Does being an open source company make it easier, harder or just different to find product-market fit? What is the relationship between product-market fit and project-market fit? In this session, we'll go over some of the basics of product for engineering-driven startups and considerations for striving for PMF in the open source space. This session will also include an open discussion and case studies.

Open Source DataOps and MLOps Strategies

Here we will try to demystify data's hardest problems- interoperability, standardization, and vendor lock-in. From pipelines to serving models, this session discusses strategies for the promotion of open source technologies as groups try to implement their own DataOps and MLOps infrastructures.

Fundamentals of DataOps

* While building pipeline after pipeline- we might wonder, what comes next? Automation and Data Quality, of course! Organizations today are facing complex challenges in the end-to-end deployment of data applications, from initial development to operational maintenance. This process requires seamless integration of CI/CD practices, containerization, data infrastructure, MLOps, and security measures. This session discusses strategies and a complete beginner's roadmap for groups trying to implement their own DataOps infrastructures from scratch by empowering developers, architects, and decision-makers to effectively leverage open-source tools and frameworks for streamlined, secure, and scalable ML application deployments.

History and Future of Iceberg REST Catalogs

While Iceberg primarily concentrates on its role as an open data format for lakehouse implementation, it needs to heavily leverage its catalog for tracking tables and allowing external tools to interface with the metadata. In Iceberg 0.14.0, the community introduced the REST Open API Specification, but there is a good history into why it was developed and why the Iceberg community has decided not to provide it’s own service instead. In 2024 especially, we’ve seen many third party catalog service providers pop up instead, each with its own unique flavour- but realistically, what is the outcome we can expect from this widespread adoption? Together, we’ll review not only the history of the REST Catalog Spec, but the future of the many offshoot services it has sparked. Please note this talk is not a comparison of the catalog service providers, but instead the rationale on the Iceberg community to provide a spec and why everyone’s hedging their bets on Iceberg as the next standard.

Lisa N. Cao

Product Manager at Datastrato

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top