Session

Surviving Panics, Fatal Errors, and Crashes: Lessons from the Trenches

Ever found yourself in the midst of chaos caused by panics or unexpected crashes? Join me as I share our team's journey through the wilderness of debugging and resolving critical issues that threatened the stability of our system. In this engaging presentation, we'll explore the challenges we faced, the lessons we learned, and the strategies we employed to emerge victorious.

With the rollout of a new feature, our system encountered pod crashes triggered by a dreaded "fatal error: concurrent map iteration and map write." What ensued was a month-long saga of investigation, root cause analysis, and relentless pursuit of solutions. Throughout this ordeal, we discovered invaluable insights that transformed our approach to handling errors and ensuring system resilience.

Our arsenal of survival tactics included:
* Logging, logging, and more logging: Amplifying our logs to capture crucial insights into system behavior.
* Strategic placement of panic recovery: Learning the importance of recovering from panics within the goroutine where they occur.
* The art of reading stack traces: Recognizing the significance of dissecting stack traces, even when they initially seem perplexing.
* Uncovering the hidden impact of non-critical functionality: Understanding how seemingly innocuous components can disrupt critical workflows.
* Distinguishing between fatal errors and panics: Recognizing the distinction between different types of errors and their implications.
* Identifying unrecoverable errors in Go: Gaining awareness of the range of errors that cannot be recovered from in the Go programming language.

By sharing our hard-won wisdom and practical insights, we aim to empower fellow developers to navigate similar challenges with confidence and resilience. Whether you're a seasoned engineer or a newcomer to the field, this presentation offers invaluable guidance for building robust, reliable applications that stand the test of time.

Join us as we unravel the mysteries of error handling and equip ourselves with the tools and knowledge to overcome any obstacle that comes our way. Let's transform setbacks into opportunities for growth and emerge stronger together.

Andrii Raikov

Principal Software Engineer at Delivery Hero SE

Berlin, Germany

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top