Session

Azure Cosmos DB - Low Latency and High Availability at Planet Scale

Azure Cosmos DB is a fully-managed, multi-tenant, distributed, shared-nothing, horizontally scalable database that provides planet-scale capabilities and multi-model APIs for Apache Cassandra, MongoDB, Gremlin, Tables, and the Core (SQL) APIs. It currently powers many mission-critical services both within Microsoft (such as Microsoft Teams and Active Directory) and across large-scale Fortune 500 organizations (such as Walmart and Adobe).

This talk covers the internal architecture of Azure Cosmos DB and how it achieves high availability, low latency, and scalability. I will first cover the design of the storage engine, with particular emphasis on ensuring high availability and scalability through partitioning and replication. Next, I will zoom in on the request routing gateway to see how it has evolved to solve the well-known multi-tenant cloud infrastructure challenges of containing noisy neighbors and limiting blast radius. Lastly, I will discuss performance as a feature and as a culture. I will cover what I measure and how we think about SLOs to achieve and maintain low latency.

Building planet-scale services necessitates solving complex scalability challenges and making numerous tradeoffs across various components in the product. I look forward to sharing my experiences and lessons learned in building Azure Cosmos DB.

Kevin Pilch

@Pilchie

Seattle, Washington, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top