Session
Fix First, Investigate Later: When an eBPF Rollout Brought Down Our Network
When your production network suddenly starts dropping packets, the last thing you expect is that your cloud provider quietly deployed a new monitoring tool. This talk shares our journey from mysterious outage to desperate fix to surprising discovery.
It started with alerts: packet loss spiking, network throughput crashing from 800MB/s to near 250MB/s. No recent changes on our end. Hours into the crisis, we discovered an unfamiliar DaemonSet running eBPF programs - Retina, silently rolled out across our clusters. But here's the catch: we couldn't remove it. The daemonset was reconciled instantly back to original state after an update.
With users impacted and no time for root cause analysis, we took a leap: build a mutation webhook to intercept and neuter this mysterious DaemonSet. It worked instantly - networks recovered, crisis averted.
Only then could we investigate: How did an eBPF observability tool cause such devastation? And why didn't we know it was being deployed?
Zain Malik
Principal Software Engineer @ Exostellar
Vienna, Austria
Links
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top