Speaker

Mofi Rahman

Mofi Rahman

Developer Advocate

New York City, New York, United States

Mofizur Rahman (@moficodes) is a Developer Advocate at Google. His favorite programming language these days is Go. He is a strong believer of the power of open source and importance of giving back to the community. He is a self proclaimed sticker collecting addict and has collected several box full of stickers with no signs of stopping. He can talk about board games for days.

He creates short tech videos on https://youtube.com/@ContainerBytes where covers various cloud native technologies.

Area of Expertise

  • Information & Communications Technology

Topics

  • Kubernetes
  • Cloud Native
  • DevOps
  • cloud
  • Serverless
  • Google Cloud
  • Google Cloud Paltform
  • ai
  • ml
  • LLMs
  • generative ai

State of Kubernetes Observability

Kubernetes has redefined how we think about deploying and managing distributed applications. But Debugging and troubleshooting can be challenging, especially in a production environment. When something goes wrong, it can be hard to know where to start looking for the problem. That’s where Kubernetes observability comes in. With the right tools and practices, you can gain deep insights into how your applications are running on Kubernetes.
Observability is the cornerstone of any well-functioning Kubernetes environment. But one visit to the CNCF landscape can cause your eyes to glaze over with all of the options that exist in this space. It can be quite overwhelming to make sense of it all. In this talk, I will walk through some of the more common observability tools and frameworks that are available for Kubernetes today. I will also discuss how you can use them to troubleshoot your applications in production and debug issues in development.

Navigating the Processing Unit Landscape in Kubernetes for AI Use Cases

With the emergence of LLMs (Large Language Models) and other Machine Learning (ML) workloads running on Kubernetes, gone are the days when just CPU is enough. Machine Learning and Artificial Intelligence workloads are best served by specialized processing units. While CPUs are great at doing work sequentially, Artificial Intelligence and Machine Learning require a different approach to processing information - a highly parallel one. In Kubernetes, that means GPUs (Graphical Processing Units) and TPUs (Tensor Processing Units). This talk gives you an introduction of what each type of processing unit is, what they are good at, and how to use them well in Kubernetes.

Defying GPU Scarcity: Strategies for Efficient Serving with Smaller GPUs

Let’s face it, we’re in a constant GPU shortage. Getting access to enough GPUs, especially the largest and latest, is a huge challenge. This talk explores strategies for maximizing the performance and efficiency of serving LLMs on multiple GPUs, particularly older and smaller ones that are more readily available.

Sharding techniques allow for the partitioning of a workload across multiple smaller GPUs, thereby enabling the completion of the work with multiple smaller GPUs. In addition, quantization techniques can be used to reduce memory usage for a workload by trading off a small amount of precision, which is often acceptable in serving use cases.

Join us to delve into the practical aspects of implementing these techniques, providing insights into the trade-offs involved and the potential price to performance gains achievable. By effectively utilizing multiple GPUs, organizations can overcome resource availability limitations to harness the power of LLMs.

Mofi Rahman

Developer Advocate

New York City, New York, United States

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top