![Mofi Rahman](https://sessionize.com/image/8604-400o400o2-ZtrocnizqB4nyoNoQLQUv.png)
Mofi Rahman
Developer Relations Engineer, Google
New York City, New York, United States
Actions
Mofi Rahman (@moficodes) is a Developer Advocate at Google. His favorite programming language these days is Go. He is a strong believer of the power of open source and importance of giving back to the community. He is a self proclaimed sticker collecting addict and has collected several box full of stickers with no signs of stopping. He can talk about board games for days.
He creates short tech videos on https://youtube.com/@ContainerBytes where covers various cloud native technologies.
Links
Area of Expertise
Topics
State of Kubernetes Observability
Kubernetes has redefined how we think about deploying and managing distributed applications. But Debugging and troubleshooting can be challenging, especially in a production environment. When something goes wrong, it can be hard to know where to start looking for the problem. That’s where Kubernetes observability comes in. With the right tools and practices, you can gain deep insights into how your applications are running on Kubernetes.
Observability is the cornerstone of any well-functioning Kubernetes environment. But one visit to the CNCF landscape can cause your eyes to glaze over with all of the options that exist in this space. It can be quite overwhelming to make sense of it all. In this talk, I will walk through some of the more common observability tools and frameworks that are available for Kubernetes today. I will also discuss how you can use them to troubleshoot your applications in production and debug issues in development.
Navigating the Processing Unit Landscape in Kubernetes for AI Use Cases
With the emergence of LLMs (Large Language Models) and other Machine Learning (ML) workloads running on Kubernetes, gone are the days when just CPU is enough. Machine Learning and Artificial Intelligence workloads are best served by specialized processing units. While CPUs are great at doing work sequentially, Artificial Intelligence and Machine Learning require a different approach to processing information - a highly parallel one. In Kubernetes, that means GPUs (Graphical Processing Units) and TPUs (Tensor Processing Units). This talk gives you an introduction of what each type of processing unit is, what they are good at, and how to use them well in Kubernetes.
Defying GPU Scarcity: Strategies for Efficient Serving with Smaller GPUs
Let’s face it, we’re in a constant GPU shortage. Getting access to enough GPUs, especially the largest and latest, is a huge challenge. This talk explores strategies for maximizing the performance and efficiency of serving LLMs on multiple GPUs, particularly older and smaller ones that are more readily available.
Sharding techniques allow for the partitioning of a workload across multiple smaller GPUs, thereby enabling the completion of the work with multiple smaller GPUs. In addition, quantization techniques can be used to reduce memory usage for a workload by trading off a small amount of precision, which is often acceptable in serving use cases.
Join us to delve into the practical aspects of implementing these techniques, providing insights into the trade-offs involved and the potential price to performance gains achievable. By effectively utilizing multiple GPUs, organizations can overcome resource availability limitations to harness the power of LLMs.
![](https://sessionize.com/image/8604-400o400o2-ZtrocnizqB4nyoNoQLQUv.png)
Mofi Rahman
Developer Relations Engineer, Google
New York City, New York, United States
Links
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top