Session
Building a Culture of Continuous Resiliency
Failures are inevitable, and well architected distributed systems aren’t any exception. Any outage or turbulence in production not only impact revenue but also damage your company’s brand and reputation. With increasing complexity of the micro service architecture world, it is important to ensure that products & platforms are reliable and should be proactively validated before a real incident. Delightful and uninterrupted experience to the end users is a must.
In this session, we will share how Intuit with 1000s of services across 200+ clusters is validating resiliency at scale by leveraging company wide Game Day events as well as continuous integration pipelines. We will demonstrate our adoption of open source chaos engineering capabilities using LitmusChaos, a cloud native computing foundation (CNCF) project, integrated with Argo, observability, and Intuit’s remediation tools. Come hear about our learnings and journey so you can apply the same principles and patterns within your organizations to help release reliable products with confidence.
Benefits to the Ecosystem:
At Intuit, we have several flagship products (Turbotax, Quickbooks, Mint, Credit Karma, MailChimp) that serve millions of customers. Our mission is powering prosperity around the world and making sure any incident goes through detailed root cause analysis. During this course, we have learned that we must ensure that our products are reliable and can withstand any turbulence.
In this talk, we will share insights on how Intuit paved its path to perform company-wide mandatory Game days and how other teams can apply similar processes & automations to achieve continuous resilience at scale. We will demonstrate how teams can use our integration patterns using various CNCF projects (LitmusChaos, ArgoWorkflow, ArgoCD, Argo Application Sets) to execute chaos both within a continuous pipeline or ad-hoc during game days. We will share how we enabled 1000s of Intuit developers to think about resiliency as part of design and implementation and also share our learnings on how to execute company-wide game days to reduce outages in production in a controlled manner.
Deepthi Panthula
Senior Staff Product Manager
San Jose, California, United States
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top