Suvrakamal Das

Software Engineer @Mattoboard

San Francisco, California, United States

Actions

Suvrakamal Das is a Machine Learning Engineer and TensorFlow-certified ML developer who builds production-grade agentic systems and language technologies for low‑resource languages, with work spanning banks, startups, and global NGOs. He has published research at SciPy 2024 on efficient model architectures and presented work on trans-tokenization and GPU-accelerated multi-hop reasoning at conferences like SciPy, FOSS Kolkata, and the PyTorch Conference 2025. Outside of work he ships indie ML products, mentors developers, and regularly shares insights and research updates on Twitter/X.

Area of Expertise

Information & Communications Technology

Topics

Machine Learning
Machine Learning and Artificial Intelligence
Data
Data Sciene
python

Getting Your LLM Eval Up and Right with Automated LLMOps

Large Language Models (LLMs) are powerful, but figuring out if they give correct answers is a big challenge. One major problem is creating a reliable ground truth an accurate reference answer for evaluation. Doing this by hand is slow and expensive, and letting LLMs generate their own reference can lead to circular errors and unclear benchmarks.

In this session, we explore how open-source evaluation frameworks such as Ragas, DeepEval, MLFlow, etc. help solve this problem. We will explain in simple terms how these tools set up clear evaluation pipelines that do not rely only on human annotations. Instead, they use automated methods to generate a reliable ground truth, making it easier to judge if an LLM is performing well.

We will also show you how to integrate these tools into your LLMOps workflow to continuously test and improve your model. You will learn about key evaluation metrics like faithfulness, contextual precision, and answer relevancy.

With this session, you will end up having a practical understanding of how to set up an evaluation framework that automatically generates ground truth and provides clear, actionable feedback.

Building and Deploying Autonomous AI Agents on Social Media Platforms Using an Agentic Framework

The integration of AI agents on social media platforms for real-time interaction faces significant hurdles, including the complexity of managing real-time data streams, ensuring agent autonomy, and maintaining consistent performance across various languages and contexts. Developers often struggle with creating agents that can react intelligently to the dynamic environment of social media without the need for constant human oversight.

The ai16z's Eliza framework offers a solution by providing a robust, open-source platform specifically designed for building and deploying autonomous AI agents that can operate with real-time data. Eliza simplifies the process by offering tools for consistent agent behavior, real-time data processing, decentralized inference and scalability.

In this talk, we will explore how the framework handles the intricacies of real-time data, allowing agents to engage in meaningful, context-aware conversations autonomously. Attendees will learn about the architecture that supports this, including data management, agent memory systems, and the integration of multiple AI models for enhanced functionality.

Crafting Production Ready RAG/GenAI Recipes with OPEA

Enterprises face significant challenges in developing and deploying Python-based Gen AI solutions—from model development, fine-tuning, and bias resolution to large-scale deployment. With few standardized tools available, OPEA, the Open Platform for Enterprise AI by the Linux Foundation, fills the gap. It offers a microservices framework for state-of-the-art GenAI systems, including LLMs, data stores, and prompt engines to accelerate enterprise adoption. The platform provides blueprints for end-to-end workflows like ChatQnA, CodeGen, and RAG systems. This talk explores practical steps for deploying cloud-native GenAI applications on Kubernetes across various hyperscalers, showcasing diverse data stores, open-source vector databases, and managed services to demonstrate RAG capabilities with over 50K documents. Attendees will learn deployment strategies with OPEA and discover clear paths for contributing to the project.

From Monolithic to Mosaic: Collaborative SLMs Ecosystems for Cost-Efficient, Edge-Ready solutions!

Large Language Models excel at filtering, summarization, and code generation, but their heavy compute needs drive up costs and limit scalability. In this talk, we propose a lightweight alternative that moves away from monolithic LLMs to a modular ecosystem of open-source Small Language Models (SLMs) managed by a central Master Agent.

The Master Agent dynamically directs requests to specialized Worker Agents, each running an SLM (like Phi3, Orca-mini) fine-tuned for a specific function. Distributing tasks among smaller models lowers resource consumption and cost.

This approach also meets the growing need for edge-compatible solutions. Compact SLMs can run on IoT devices and mobile apps, enabling low-latency, privacy-preserving, and even offline language processing.

Our implementation, primarily in Python and supported by open-source frameworks like Langchain and Hugging Face LLMs, demonstrates how modular specialization optimizes resource use, simplifies maintenance, and ensures robust failover. Attendees will learn how to integrate this multi-agent framework into their own projects, offering a flexible, affordable, and future-proof platform for advanced language processing.

Scaling Monolingual NLP Models on Kubernetes: Leveraging Trans-tokenization

Building and deploying monolingual NLP systems for low-resource languages presents challenges, especially in handling diverse scripts and optimizing for production-scale environments. This session explores trans-tokenization, a novel method for transforming tokens across languages, enhancing large language models for monolingual capabilities. Using parallel corpora like English-Hindi, we’ll demonstrate how tools such as Unsloth and Mistral enable fine-tuning to handle non-Latin scripts effectively. A major focus will be on leveraging Kubernetes to scale monolingual NLP systems. Attendees will learn how Kubernetes facilitates resource allocation, supports distributed training, and simplifies model deployment at scale. Topics include managing workloads for parallel corpora, optimizing GPU utilization, and ensuring high availability of NLP services in production environments.

Scaling Monolingual NLP with Trans-tokenization

Building and deploying monolingual NLP systems for low-resource languages presents unique challenges, particularly in handling diverse scripts and optimizing for production-scale environments.

This session delves into the use of trans-tokenization, a novel approach to transforming tokens across languages, to enhance large language models for monolingual capabilities.

Using parallel corpora like English-Hindi, we’ll demonstrate how tools such as Unsloth and Mistral enable fine-tuning of models to handle non-Latin scripts effectively.

A key focus will be leveraging Kubernetes to scale these monolingual NLP systems. Attendees will learn how Kubernetes facilitates efficient resource allocation, supports distributed training, and simplifies model deployment in production environments. Topics include managing workloads for parallel corpora processing, optimizing GPU utilization, and ensuring high availability of NLP services at scale.

From Campus to Cloud: Scaling a Custom Facial Recognition Solution With Cloud Native Tools

This session delves into scaling a facial recognition system from campus to cloud scale using Kubernetes. It focuses on containerized model training, deployment automation, and infrastructure optimization, utilizing GPU clusters, CI/CD for zero downtime, and MLOps with KServe and Kubeflow. We explore a microservices architecture with declarative APIs linking Feast, Spark, and PostgreSQL for real-time predictions. Attendees will learn Kubernetes architecture and CLI best practices applicable to data transformation, collaboration, model retraining, and versioning. The session also discusses infrastructure sizing, balancing throughput, cost, and accuracy. Practical guides for Kubernetes-based facial recognition, emphasizing portability, fault tolerance, and availability, are provided.

Suvrakamal Das

Software Engineer @Mattoboard

San Francisco, California, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Suvrakamal Das

Actions

Links

Area of Expertise

Topics

Sessions

Getting Your LLM Eval Up and Right with Automated LLMOps

Building and Deploying Autonomous AI Agents on Social Media Platforms Using an Agentic Framework

Crafting Production Ready RAG/GenAI Recipes with OPEA

From Monolithic to Mosaic: Collaborative SLMs Ecosystems for Cost-Efficient, Edge-Ready solutions!

Scaling Monolingual NLP Models on Kubernetes: Leveraging Trans-tokenization

Scaling Monolingual NLP with Trans-tokenization

From Campus to Cloud: Scaling a Custom Facial Recognition Solution With Cloud Native Tools

Suvrakamal Das

Links

Actions