Bug hunting in distributed systems: using robustness tests to test your code better

Traditional testing methods like unit and integration tests are great for functional validation in isolation, but are they enough for distributed systems? Distributed systems need to deal with real-world failures such as network issues, hardware errors, and race conditions. One of the proven ways to test these systems is to inject failures during testing and see if the system still works as expected. This is called robustness testing, where you run the system like it would be used in real life.

Jespen is one of the first frameworks to test distributed systems by simulating such real-world scenarios and validating the operational history. Inspired by Jespen, etcd, the backbone of Kubernetes, built its own testing framework. This framework is written in Golang and for Golang projects, allowing even more failure types on the fly and verifying if the data stays consistent using Porcupine.

As etcd contributors, we will share our challenges in writing tests to force failure via gofail and our journey of developing, leveraging, and debugging issues caught by this ever-evolving framework, so that you can apply the findings to your projects with minimal tweaks.

Chun-Hung Tseng

Software Engineer at Google, etcd Contributor

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Bug hunting in distributed systems: using robustness tests to test your code better

Chun-Hung Tseng

Links

Actions