Hemanth Malla
Senior Software Engineer, Datadog
New York City, New York, United States
Actions
Hemanth Malla is a Senior Software Engineer working on Kubernetes and container networking at Datadog. He is also a Cilium CNCF maintainer. Previously he worked on various distributed systems in industries like e-commerce, fintech and high frequency trading. Apart from computers, he enjoys all things photography, drones and dark chocolate.
Links
Scaling Network Policy Enforcement Beyond the Cluster Boundary with Cilium
To keep up with infrastructure growth, companies around the world are managing an increasing number of kubernetes clusters. Enforcing kubernetes native network policy at scale is already hard enough within a single cluster. Extending this to multiple clusters is even more challenging.
Depending on the shape of your infrastructure, your cross-cluster policy requirements may be unique, and there’s no one-size-fits-all configuration. In this talk, we’ll dive deep into how different solutions work in cilium to understand sources of potential bottlenecks. We’ll discuss Clustermesh, KVstoremesh, DNS-based FQDN policy and a custom variant of KVstoremesh Datadog leverages while meshing at scale. Specifically, we’ll discuss how factors like the number of pods, identities and pod churn will impact scalability and time to policy enforcement. Join us if you’re curious about understanding the latest in cross-cluster policy and leave with actionable insights you can apply to your infrastructure.
Lessons From Building Scalable Network Policy Enforcement With eBPF
eBPF has unlocked new levels of performance and scalability for container networking. Cilium has leveraged eBPF to implement a plethora of network policy features. Kubernetes scalability has been improving with every new release, and clusters with 5k+ nodes are increasingly common. Cilium’s policy framework needs to scale for hundreds of thousands of pods, all while dealing with complex scenarios like high pod churn environments.
In this talk, Cilium maintainers will share some lessons learnt from years of programming kubernetes abstractions directly into the kernel space using eBPF. You’ll learn about how cilium efficiently intercepts traffic for enforcement both at L4 and L7, tricks used by cilium to minimize CPU overhead on each node, and some design decisions that have been instrumental in squeezing high performance out of the kernel regardless of the number of pods. Finally, we’ll discuss strategies you can follow to improve debuggability of eBPF based networking datapaths.
Hot standby load balancing with SO_REUSEPORT and eBPF
SO_REUSEPORT is a powerful feature of the Linux kernel that allows users to have more than one process listen on a given port and allow for load balancing between them. The default load-balancing strategy is round-robin, but with the help of eBPF, we can take this feature one step further and implement other load-balancing strategies. In this lightning talk, you’ll learn to implement weighted and hot standby load balancing with nothing but eBPF and SO_REUSEPORT.
Everything, Everywhere, All at Once
On March 8, 2023 Datadog experienced a massive global outage that took almost 24 hours to mitigate and a further ~24 hours to backfill data after restoring full app availability. We’ll share the trigger for the incident and why it was such a massive effort to recover from. We’ll review the technical details of the incident: why and how we lost more than 60% of our Kubernetes nodes in less than an hour, and the challenges we faced to recover the tens of thousands of impacted nodes across hundreds of clusters. This was a very tough day for us, and we will share those hard-won technical and community lessons.
eBPF: a new era in cloud infrastructure tools
eBPF has become something of a buzzword recently, but why is it being used in so many tools for observability, security and networking? What does it bring that other approaches don't offer? How can you leverage the power of eBPF in your organization?
Join this session to learn from the creators and maintainers of leading open source eBPF projects about how this kernel technology enables high-performance, scalable cloud infrastructure tools.
Day 2 with Cilium - What to expect running at scale
Cilium works great out of the box, but tweaking a few options will ensure you get the best performance as your clusters grow in size. While Cilium abstracts away a lot of complexity, users are provided with several knobs to control the underlying systems wherever necessary. This talk will quickly brush up on how Cilium interacts with Linux, Kubernetes and cloud provider network stacks and then dive into some of the challenges you might encounter running Cilium in large clusters. For example, over time, you might need to reevaluate rate limiting across the board to avoid cascading failures. You will need to carefully plan for IP address management, tune the operating system or take advantage of new features from cloud providers. Since Cilium is a critical piece of your infrastructure, we’ll talk about what health metrics to keep a close eye on. We’ll also discuss some best practices for deploying and validating your rollouts with out of the box connectivity tests.
CRD vs Dedicated Etcd as Storage Backend : Lessons from Taming High Churn Clusters
Configuring Cilium for large clusters introduces a spectrum of challenges, with critical decisions like choosing a CRD backend, leveraging CiliumEndpointSlices, or opting for a dedicated kvstore for state propagation significantly impacting scalability. Each choice comes with its own set of pros and cons, necessitating distinct monitoring strategies. Moreover, once a configuration is in place, the challenge lies in migrating between options seamlessly without incurring downtime. As your kubernetes clusters grow, over time you may need to revisit some decisions to optimize for performance.
If you've ever wondered how to fine-tune Kubernetes and Cilium for large-scale clusters or if you're seeking insights on setting up robust monitoring to prevent outages, this session is tailored for you. Join us to explore strategies, best practices, and real-world lessons and leave with actionable insights that will allow you to confidently navigate scale with Kubernetes and Cilium.
Cilium: From Service Mesh to Kubernetes and Beyond with eBPF
Welcome to Cilium! In this session you'll get an update on how Cilium is expanding the frontiers of cloud native networking, observability, and security. You'll hear about the latest developments and future roadmap of the project and why it has become the CNI of choice in the wild.
We will cover things like how Cilium is leveraging eBPF to speed up container networking, doing mutual authentication of services with Cilium Service Mesh, and expanding cloud native principles beyond Kubernetes with Cilium Mesh. In this session you'll hear from Cilium contributors and users Datadog, Isovalent, and SuperOrbital.
Open Source Summit Europe 2024 Sessionize Event
KubeCon + CloudNativeCon Europe 2024 Sessionize Event
CNCF-hosted Co-located Events Europe 2024 Sessionize Event
KubeCon + CloudNativeCon North America 2023 Sessionize Event
CNCF-hosted Co-located Events North America 2023 Sessionize Event
eBPF Summit 2023 Sessionize Event
Hemanth Malla
Senior Software Engineer, Datadog
New York City, New York, United States
Links
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top