Lisa Cao

Staff Developer Relations at Databricks

Actions

Lisa is an engineering, product, and advocacy expert in open source data infrastructure and DataOps fields. At Databricks, her role oversees the open source involvement and developer relations of projects including MLflow, Apache Spark, Delta Lake, Apache Iceberg, and Unity Catalog OSS. She also serves on the LF AI & Data Governing Board, formerly led the Open Platform for Enterprise AI's (OPEA) Developer Experience Working Group, and leads the Continuous Delivery Foundation's (CDF) DataOps Initiative.

Area of Expertise

Business & Management
Information & Communications Technology

Maintaining Diverse Maintainers: How to Keep Your Project Inclusive

After maintaining open source projects for 5+ years now with diverse teams, I've learn some key ways to keep your open source project inclusive. Whether it's the platforms you use, communication style, development flexibility, project promotion, or keeping contribution barrier low, there's lots of small strategies that can be used to increase representation and community connection.

Catalogs as Context: Using metadata to power and govern the next wave of AI development

Developing powerful AI tooling has been our theme of the year, with agents and foundational models picking up steam across the board. Therein still lies the question though: how do we serve data for these applications to work effectively? What about at enterprise scale? What even is context? In this talk we discuss the current big data landscape, challenges to data platforming for AI, and why data catalogues and metadata are the only viable path forward to effective, governed AI-development. In this talk we use the open source framework, Apache Gravitino as a key example for why such a solution needs vendor neutrality.

Designing Data Infrastructure in the Age of Generative AI

Developing powerful Al tooling has been our theme of the year, with agents and foundational models picking up steam across the board. Therein still lies the question though: how do we serve data for agents to work effectively? What sort of interfaces and service mesh infrastructures will be required? What about at enterprise scale? What even is context? In this talk we discuss the current big data landscape, challenges to data platforming for Al, and the shifting importance of open table formats, catalogs, and embedded systems as means for effective, governed Al-development. In this talk we use the open source technologies such as Apache Spark, Unity Catalog OSS, and Apache Iceberg as key components of such reference architecture.

Fundamentals of DataOps

* While building pipeline after pipeline- we might wonder, what comes next? Automation and Data Quality, of course! Organizations today are facing complex challenges in the end-to-end deployment of data applications, from initial development to operational maintenance. This process requires seamless integration of CI/CD practices, containerization, data infrastructure, MLOps, and security measures. This session discusses strategies and a complete beginner's roadmap for groups trying to implement their own DataOps infrastructures from scratch by empowering developers, architects, and decision-makers to effectively leverage open-source tools and frameworks for streamlined, secure, and scalable ML application deployments.

History and Future of Iceberg REST Catalogs

While Iceberg primarily concentrates on its role as an open data format for lakehouse implementation, it needs to heavily leverage its catalog for tracking tables and allowing external tools to interface with the metadata. In Iceberg 0.14.0, the community introduced the REST Open API Specification, but there is a good history into why it was developed and why the Iceberg community has decided not to provide it’s own service instead. In 2024 especially, we’ve seen many third party catalog service providers pop up instead, each with its own unique flavour- but realistically, what is the outcome we can expect from this widespread adoption? Together, we’ll review not only the history of the REST Catalog Spec, but the future of the many offshoot services it has sparked. Please note this talk is not a comparison of the catalog service providers, but instead the rationale on the Iceberg community to provide a spec and why everyone’s hedging their bets on Iceberg as the next standard.

Lisa Cao

Staff Developer Relations at Databricks

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Lisa Cao

Actions

Links

Area of Expertise

Sessions

Maintaining Diverse Maintainers: How to Keep Your Project Inclusive

Catalogs as Context: Using metadata to power and govern the next wave of AI development

Designing Data Infrastructure in the Age of Generative AI

Fundamentals of DataOps

History and Future of Iceberg REST Catalogs

Lisa Cao

Links

Actions