Speaker

Prashant Ramhit

Prashant Ramhit

Mirantis Inc. Platform Engineer | Snr DevOps Advocate | OpenSource Dev

Dubai, United Arab Emirates

Actions

I am a seasoned technologist with over two decades of experience, beginning my career in the late 1990s as a Unix System Administrator. Over time, I expanded my expertise into roles such as DevOps Engineer, Site Reliability Engineer (SRE), and Golang Developer, gaining deep hands-on experience with cloud-native systems, infrastructure automation, and platform engineering.

As my journey progressed, I transitioned into technical leadership roles as a Product Manager and Project Manager, where I now combine engineering know-how with strategic execution—leading cross-functional teams, aligning delivery with business goals, and driving innovation across globally distributed systems.

I’ve had the privilege of contributing to impactful projects at organizations such as Zalora (Singapore), Proemion (Germany), GiantSwarm, Viadialog (France), BBC, and Netflix. My academic background includes an MS.c/M.Phil from the University of Portsmouth, UK, which grounds my practical experience in a strong theoretical foundation.

Now based in Mauritius, I continue to work at the intersection of technology and strategy, contributing to open-source communities while enjoying a peaceful island life—tending to my aquaponic farm and recharging through long walks along the beach or hikes in the mountains.

Area of Expertise

  • Information & Communications Technology

Topics

  • Kubernetes
  • Cloud Native & Kubernetes
  • Artificial intellince
  • Machine Learning/Artificial Intelligence
  • Platform Engineering
  • DevOps

Solving Real-World Edge Challenges With k0s, NATS, and Raspberry Pi Clusters

Monitoring sea algae proliferation and coral growth in real time may seem daunting, but with the right tools, it becomes an exciting edge computing project. Using k0s, the lightweight CNCF-certified Kubernetes distribution, and NATS, the connective technology for edge computing, this project solved the challenges of data collection and processing in a distributed Raspberry Pi cluster.

Leveraging k0s’s minimal resource footprint and automated scaling, paired with NATS’s efficient messaging capabilities, the project enabled real-time sensor data collection and transmission under resource-constrained conditions. Dynamically bootstrapped Raspberry Pi clusters processed data locally while integrating with a central control plane.

Learn about dynamically bootstrapping Raspberry Pi clusters with k0s, managing distributed edge clusters, deploying NATS for scalable messaging, and scaling workloads based on environmental changes. See how k0s and NATS efficiently tackle real-world challenges.

Tiny Kubernetes, Big Impact: k0s for Edge Deployments

Monitoring sea algae proliferation and coral growth in real time may seem daunting, but with the right tools, it becomes an exciting edge computing project. Using k0s, the lightweight CNCF-certified Kubernetes distribution, and NATS, the connective technology for edge computing, this project solved the challenges of data collection and processing in a distributed Raspberry Pi cluster.

Leveraging k0s’s minimal resource footprint and automated scaling, paired with NATS’s efficient messaging capabilities, the project enabled real-time sensor data collection and transmission under resource-constrained conditions. Dynamically bootstrapped Raspberry Pi clusters processed data locally while integrating with a central control plane.

Learn about dynamically bootstrapping Raspberry Pi clusters with k0s, managing distributed edge clusters, deploying NATS for scalable messaging, and scaling workloads based on environmental changes. See how k0s and NATS efficiently tackle real-world challenges.

This is a real project built with CNCF apps, based in Mauritius with over 1200 raspberrypi over 1 to 2 kilometers off the cost in the in the ocean with k0s and Nats installed.
Would be grateful if such a project can be showcased so as to show the impact of OSS apps on large scale deployment, which is helping to monitor and rebuild the marine ecosystem

Solving Distributed AI at the Edge with Deferred Inference Using k0s and cloud GPU Acceleration

Performing sophisticated object detection on constrained edge devices may seem daunting, but with the right design, it becomes a powerful distributed AI solution. Using k0s, the lightweight CNCF-certified distribution, and a deferred inference pipeline powered by YOLOv8, this project tackles the challenges of capturing and processing video frames across heterogeneous environments.
Leveraging k0s’s minimal resource footprint and streamlined orchestration, combined with a cloud GPU inference service, our architecture offloads intensive workloads from edge devices. A Go-based frame capturer with HTTPS protocol reliably transmit video frames to GPU instances for near real-time detection under bandwidth-constrained conditions. A web-based visualization layer then aggregates and displays inference results in real time.
Learn about implementing deferred inference pipelines with YOLOv8, orchestrating containerized workloads using k0s, optimizing GPU utilization for cost efficiency, and achieving low-latency edge processing. See how this architecture brings state-of-the-art computer vision to resource-limited scenarios, opening new possibilities for distributed AI deployments at scale.

Solving Distributed AI at the Edge with Deferred Inference Using k0s and cloud GPU Acceleration

Performing sophisticated object detection on constrained edge devices may seem daunting, but with the right design, it becomes a powerful distributed AI solution. Using k0s, the lightweight CNCF-certified distribution, and a deferred inference pipeline powered by YOLOv8, this project tackles the challenges of capturing and processing video frames across heterogeneous environments.
Leveraging k0s’s minimal resource footprint and streamlined orchestration, combined with a cloud GPU inference service, our architecture offloads intensive workloads from edge devices. A Go-based frame capturer with HTTPS protocol reliably transmit video frames to GPU instances for near real-time detection under bandwidth-constrained conditions. A web-based visualization layer then aggregates and displays inference results in real time.
Learn about implementing deferred inference pipelines with YOLOv8, orchestrating containerized workloads using k0s, optimizing GPU utilization for cost efficiency, and achieving low-latency edge processing. See how this architecture brings state-of-the-art computer vision to resource-limited scenarios, opening new possibilities for distributed AI deployments at scale.

Dynamic GPU Autoscaling: Leveraging KServe and NVIDIA DCGM for Cost Efficient scaling

Implementing dynamic GPU autoscaling for deferred inference may seem daunting, but with the right approach, it becomes a powerful way to boost performance while containing costs. By leveraging KServe or KEDA for serverless ML deployment and NVIDIA’s DCGM metrics, this system scales GPU resources in real time based on actual utilization rather than simple request counts. A custom metrics adapter feeds DCGM_FI_DEV_GPU_UTIL data into Kubernetes’ Horizontal Pod Autoscaler (HPA), ensuring GPU capacity matches computational needs. Asynchronous prediction endpoints, coupled with scaling algorithms that factor in memory usage, compute load, and latency, deliver near-optimal resource allocation for complex workloads like object detection. This talk explores the technical steps behind utilization-based autoscaling with KServe or KEDA, including monitoring, alerting, and performance tuning. Real-world benchmarks from production show up to 40% GPU cost savings without compromising inference speed or accuracy. Attendees will learn practical methods for bridging ML frameworks and infrastructure, making cloud GPU-accelerated ML more accessible and efficient in modern cloud-native environments

Scaling K8s Everywhere: Introducing k0rdent Your Open Source Platform Engineering Super Controller

In modern platform engineering, managing fleets of Kubernetes clusters across clouds, on-premises datacenters, and edge devices presents operational sprawl, inconsistent tooling, and lock-in challenges. k0rdent is the first fully open-source Distributed Container Management Environment (DCME) that transforms this complexity into a declarative, Kubernetes-native control plane. Platform architects can design and operate developer and workload platforms anywhere, at scale—with zero lock-in and 100% open source. In this session, you’ll learn how k0rdent leverages Kubernetes standards (ClusterAPI, CRDs, GitOps) to provide a single pane of glass for multi-cluster lifecycle management, service composition, and infrastructure automation. We’ll dive into k0rdent’s modular architecture, walk through a live demo of provisioning clusters and deploying services across heterogeneous environments, and explore how the community can contribute to its rapidly growing ecosystem

Edge-Native Agentic AI: Simplify and Scale Agent Deployments with k0rdent

Agentic AI is changing how developers and startups build intelligent systems—especially in edge environments where low-latency, real-world data is critical. However, deploying and scaling agent-based applications on distributed edge infrastructure remains a challenge for many early-stage teams.

This session introduces k0rdent, an open-source extension to Kubernetes that simplifies the deployment of agentic AI workloads at the edge. We'll walk through the basics of Kubernetes-based infrastructure for agent frameworks like LangChain, Autogen, and Semantic Kernel, and show how k0rdent bridges the gap between developer-friendly agent workflows and real-world edge deployments.

Attendees will see how to:

Set up an edge-native cluster with GPU support in minutes.

Run agent-based AI inference tasks (e.g., computer vision, chat interfaces) on these clusters.

Connect edge workloads to real-time event streams (like NATS or MQTT) for fast feedback loops.

Automate lifecycle management—crucial for startups with small DevOps teams.

We'll wrap up with tips and best practices for early-stage teams building their first agent-driven edge workflows, including safety and testing basics for AI deployments.

Whether you're prototyping your first agent or looking to scale to production, join us to explore how k0rdent can help you go from idea to edge-native deployment faster.

Outline Aligned with Topics of Interest
1. Getting Started with Agentic AI and the Edge

Define agentic AI for edge scenarios.

Challenges of real-world deployment (latency, data sovereignty, updates).

2. Intro to Agent Frameworks and k0rdent

Brief overview of frameworks (LangChain, Semantic Kernel, Autogen).

How k0rdent simplifies running these frameworks on edge clusters.

Basic cluster creation workflow (developer-friendly).

3. Real-World Applications

Show real examples (e.g., computer vision agents, autonomous retail agents) - with demo of a production application

How edge GPU acceleration helps agentic tasks.

4. Human-Agent Interfaces & Experience Design

Practical UX considerations for edge agents (e.g., low-latency feedback).

How to connect with frontend/mobile/web experiences.

5. Responsible AI and Testing Basics

How to test, update, and monitor edge AI agents safely.

Basics of versioning, rollback, and resource management with k0rdent.

6. Developer Experience and Best Practices

Early learnings from deploying edge-native agents.

Tips for small teams: avoid over-engineering, automate updates.

7. Live Demo or Visual Walkthrough

Show the agent deployment workflow on k0rdent.

Highlight NATS data flows for real-time agent feedback.

Taming the Starship Kraken: Zero-Downtime Vertical Scaling with Kubernetes 1.33

Juggling CPU and memory allocations for stateful services used to feel like taming a zero-gravity kraken aboard a starship - one tentacle-slip and the whole mission derails. Kubernetes 1.33’s In-Place Pod Vertical Scaling transforms that chaos into a choreographed ballet: you patch your Pod’s resources and the kubelet morphs cgroups on the fly - no eviction, no restart. With the resize subresource and mutable resource fields, platform teams amp up CPU and RAM for heavy inference bursts or dial them back to reclaim idle capacity.

In our tests, we banished 100% of pod restarts for vertical changes, slashed over-provisioned memory by 30%, and achieved sub-500 ms resize operations under warp-speed load. One API call, zero disruptions - warm caches stay toasty like a dragon’s hoard, persistent connections hum like hyperspace drives, and SLOs glow green in the control panel. It’s resource orchestration without risk: a sleek starship crew, not a space circus.

From Chaos to Control: Herding GPU-Hungry Dragons with Kubernetes DRA

Balancing AI training, real-time inference, and bursty batch jobs on a shared accelerator fleet used to feel like herding caffeinated dragons. Kubernetes 1.33’s Dynamic Resource Allocation (DRA) turns that chaos into choreography: Pods state exactly what accelerator slice they need and the scheduler guarantees where it will run, long before a container starts. With the new partitionable-device, prioritized-list, and device-taint gates, platform teams carve GPUs, FPGAs, or Smart-NICs on demand—no nvidia-smi incantations, no “GPU not found” crash loops. On GCP we slashed idle GPU hours by 42 %, shrinking datacenter spend while giving each tenant iron-clad isolation via namespace-scoped claims. Dev namespaces grab bite-size slices for rapid prototyping; prod jobs scale to full-fat allocations using the same YAML, zero redeploys. Observability hooks keep SLO dashboards glowing!! One control plane, two QoS tiers, dragons tamed.

KubeCon + CloudNativeCon Europe 2025 Sessionize Event

April 2025 London, United Kingdom

Prashant Ramhit

Mirantis Inc. Platform Engineer | Snr DevOps Advocate | OpenSource Dev

Dubai, United Arab Emirates

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top