Cuong Nguyen

Cloud Solution Engineer

Hanoi, Vietnam

Actions

Experienced Cloud and Infrastructure Engineer with 6 years of expertise in DevOps practices, and storage operations. Skilled in IT networking, and system performance optimization, with a proven track record of resolving complex infrastructure challenges in production environments.
Has hands-on experience in researching, designing, and operating distributed storage systems in Telco Cloud environments — from initial architecture decisions through day-2 operations and lifecycle management.

Badges

Area of Expertise

Information & Communications Technology
Manufacturing & Industrial Materials

Topics

Cloud & DevOps
Cloud Native
Cloud Architecture
Cloud Computing
Kubernetes
Cloud Native & Kubernetes

SCTP, eBPF, and Multus: Fixing What Kubernetes Breaks for 5G Networking

You’ve probably heard that the telecommunications industry is moving its 5G Core into Kubernetes. But what happens when a fundamental telco requirement, SCTP multihoming, meets a platform that was never designed for it?

In 5G networks, the N2 interface between gNB and AMF depends on SCTP multihoming to deliver “five nines” availability through multi path failover. However, standard Kubernetes networking, especially kube proxy and Service abstractions, is fundamentally blind to SCTP multi IP semantics. The result is multihoming silently breaks, failover degrades, and operators resort to brittle workarounds like exposing Pods directly to external networks.

This talk explains why Kubernetes fails SCTP and how to fix it. We show how LoxiLB uses eBPF to act as a stateful SCTP aware proxy, rewriting INIT and INIT ACK in real time. Combined with Multus and BGP, this enables true end to end SCTP multihoming in cloud native 5G.

Investigate First, Decide Second: The Missing Step in Kubernetes Alert Response

"We automated the investigation, not the fix."
Every DevOps engineer knows the drill. Alert fires. Open AlertManager, switch to Kibana, check Grafana. Correlate manually across three tools. At 2AM, this takes a hour before you know enough to act.
On Kubernetes 1.33, we integrated an AI agent that reads alerts from AlertManager, queries logs from Elasticsearch, pulls metrics from Prometheus assembling a structured investigation before anyone is paged. The engineer receives a brief, not a fire alarm.
Three scenarios, same stack, different outcomes:
- CrashLoopBackOff: agent identifies OOMKill pattern across 3 restart cycles — engineer approves fix in 5 minutes, not 1 hour
- ImagePullBackOff: Kibana is empty because container never started
- OOMKilled: Prometheus memory trend reveals misconfigured resource limit — fix proposed before engineer opens a single tab
Attendees leave with a reusable investigation pipeline and approval gate design for AI-assisted alert response on Kubernetes.

SCTP, eBPF, and Multus: Fixing What Kubernetes Breaks for 5G Networking

The telecom industry is shifting 5G Core workloads to Kubernetes. However, strict 3GPP rules demand SCTP multihoming for ultimate reliability on the AMF N2 interface. How do these rigid standards survive in a dynamic container ecosystem? Default tools like kube proxy are blind to the multiple IP payload needs of SCTP. This flaw breaks multihomed connections. Operator are often force into fragile workarounds, bypass internal load balancers to expose Pods directly. This completely defeats the purpose of adopting Kubernetes in the first place.How do we fix this without ruining network isolation ? This session unpacks the friction between 5G demands and standard container networking. We will show you how to use eBPF, LoxiLB, and Multus CNI to rewrite SCTP hanshake on the fly. You will learn about the dual pipeline datapath, eBPF map state syncing for seamless high availability, and walkaway with a proven blueprint for building a resilient telco networking architecture

Changing the Engine Mid-Flight: Zero-Downtime Ceph Upgrades

"Upgrade Ceph in our telco cloud with zero downtime." That is a mandatory requirement every 1–2 years to ensure security patches, bug fixes, and continued support from vendors and the community. We have a Ceph Cluster version 18.x.x with 10–50 nodes deployed on bare-metal running 5G Core workloads (AMF, UPF,...).
This talk covers a first-hand Reef → Squid upgrade with three real failure scenarios:
- Monitor quorum loss: root cause analysis, recovery sequence, and how to prevent quorum degradation during daemon rolling restarts
- OSD storms triggered by rebalancing that threatened cluster stability
- Incompatible client versions silently blocking the upgrade path

Beyond failure recovery, we'll share the upgrade sequencing strategy we developed — covering pre-flight checks, daemon upgrade ordering (MGR → MON → OSD → MDS/RGW).
Attendees leave with a reusable pre-upgrade checklist and sequencing framework for bare-metal Ceph in high-stakes environments.

Rethinking Load Balancing in Kubernetes: LoxiLB, eBPF, and the End of kube-proxy?

Kubernetes networking has come a long way since the early days of kube-proxy and iptables. But as workloads become more latency-sensitive and network-intensive - particularly in edge, Telco, and private cloud environments - it's clear that traditional solutions like MetalLB or kube-proxy are hitting their limits.

In this talk, we'll dive into LoxiLB, a high-performance, cloud-native L4 load balancer built with eBPF. We'll show how it can replace or complement kube-proxy and MetalLB, especially in scenarios where:

- Load balancing needs to scale across thousands of Services
- Network throughput and latency become critical
- Multi-path routing, BGP, ECMP

Through a series of real-world benchmarks, we'll compare:

- MetalLB vs LoxiLB throughput under high connection load
- Latency distribution
- Scalability across nodes using ECMP

With LoxiLB and eBPF, we’re entering a new phase of Kubernetes networking: one where load balancing is programmable, observable, and built for scale.

Can Kubernetes Networking Keep Up with Telco Cloud? Lessons Learned Networking CNFs

Telco workloads such as 5G Core and MEC pose unique networking challenges when deployed as Cloud-Native Network Functions (CNFs) on Kubernetes. Unlike typical apps, CNFs demand ultra-low latency, line-rate throughput, and scaling to millions of sessions — requirements that push Kubernetes networking to its limits.

In this session, we share hard-won lessons learned from a layered networking design: Calico with IPVS mode for scalable control-plane routing and policy, and SR-IOV, OVS, and Multus for high-performance data-plane traffic. This separation of control and user flows enabled us to meet carrier-grade SLAs, but also revealed challenges in automation, NUMA alignment, hugepage allocation, and observability when traffic bypassed the kernel.

Attendees will gain practical insights into designing and operating CNF networking: how to balance performance and flexibility, automate complex CNI setups, and maintain visibility in high-performance telco clusters.

KubeCon + CloudNativeCon Japan 2026 Sessionize Event Upcoming

July 2026 Yokohama, Japan

Cuong Nguyen

Cloud Solution Engineer

Hanoi, Vietnam

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Cuong Nguyen

Actions

Links

Badges

Area of Expertise

Topics

Sessions

SCTP, eBPF, and Multus: Fixing What Kubernetes Breaks for 5G Networking

Investigate First, Decide Second: The Missing Step in Kubernetes Alert Response

SCTP, eBPF, and Multus: Fixing What Kubernetes Breaks for 5G Networking

Changing the Engine Mid-Flight: Zero-Downtime Ceph Upgrades

Rethinking Load Balancing in Kubernetes: LoxiLB, eBPF, and the End of kube-proxy?

Can Kubernetes Networking Keep Up with Telco Cloud? Lessons Learned Networking CNFs

Events

KubeCon + CloudNativeCon Japan 2026 Sessionize Event Upcoming

Cuong Nguyen

Links

Actions