Speaker

Nikhil Rana

Nikhil Rana

Senior AI Consultant at Google Cloud

Actions

Nikhil is an applied data science & cloud professional with over a decade of experience in developing and implementing Machine learning, Deep Learning, and NLP-based solutions for a variety of industries like Finance, FMCG, etc. He is a passionate advocate for the use of data science to solve real-world problems and is always looking for new ways to use data to make a positive impact on the world.

Badges

Area of Expertise

  • Business & Management
  • Information & Communications Technology

Beyond Similar: Building Diverse Search with MMR

In the era of vector search and semantic similarity, returning highly relevant results is only half the battle. When search results are too similar, users must wade through redundant information to find diverse perspectives. This talk introduces Maximum Marginal Relevance (MMR) implementation in OpenSearch, a powerful technique that optimally balances result relevance with diversity.

Key takeaways will include:
- Implementing MMR reranking with OpenSearch's vector search
- Optimizing performance for large-scale deployments
- Measuring and tuning diversity metrics
- Real-world applications and success patterns

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes

In the dynamic landscape of AI/ML, deploying and orchestrating large open-source inference models on Kubernetes has become paramount. This talk delves into the intricacies of automating the deployment of heavyweight models like Falcon and Llama 2, leveraging Kubernetes Custom Resource Definitions (CRDs) to manage large model files seamlessly through container images. The deployment is streamlined with an HTTP server facilitating inference calls using the model library.

This session will explore eliminating manual tuning of deployment parameters to fit GPU hardware by providing preset configurations. Learn how to auto-provision GPU nodes based on specific model requirements, ensuring optimal utilization of resources. We'll discuss empowering users to deploy their containerized models effortlessly by allowing them to provide a pod template in the workspace custom resource inference field. The controller dynamically, in turn, creates deployment workloads utilizing all GPU nodes.

LLM's Anywhere: Browser Deployment with Wasm & WebGPU

In today's interconnected world, deploying and accessing machine learning (ML) models efficiently poses significant challenges. Traditional methods rely on cloud GPU clusters and constant internet connectivity. However, WebAssembly (Wasm) and WebGPU technologies are revolutionizing this landscape. This talk explores leveraging Wasm and WebGPU for deploying Single Layer Models (SLMs) directly within web browsers, eliminating the need for extensive cloud GPU clusters and reducing reliance on constant internet access. We showcase practical examples and discuss how Wasm enables efficient cross-platform ML model execution, while WebGPU optimizes parallel computation within browsers. Join us to discover how this fusion empowers developers and users alike with unprecedented ease and efficiency in browser-based ML, while reducing dependence on centralized cloud infrastructure and internet connectivity constraints.

OpenSearchCon Europe 2025 Sessionize Event

April 2025 Amsterdam, The Netherlands

KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 Sessionize Event

August 2024 Hong Kong

Nikhil Rana

Senior AI Consultant at Google Cloud

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top