Witthawin Sripheanpol

AI Researcher

Bangkok, Thailand

Actions

Data scientist and AI Engineer expert with over 5 years of experience in the AI, Data & Robotics fields. A developer behind various national-level AI & Data Solutions innovations, including Generative AI technologies such as Large Language Models, Multimodal, and Multi-Agent systems.

Area of Expertise

Information & Communications Technology
Physical & Life Sciences

Topics

AI & Machine Learning
AI Agents
Large Multimodal Models
Large Language Models
Computer Vision
AI & Robotics

AI-Powered Online Game Automation and Its Security Implications with Python-Based

This talk presents a practical and security-aware exploration of AI-powered bot development for online games, using a Python-based stack centered on OpenCV and PyTorch. Motivated by the question of whether complex in-game behaviors—such as farming, leveling, and resource collection—could be automated through machine learning, we implemented an AI agent capable of perceiving game environments and making context-aware decisions under live gameplay conditions.

Our system integrates auto image segmentation, object detection, and input emulation to construct a responsive—though latency-bound—bot loop. We detail the architecture including frame capture, model inference, and action execution, and present a working 3D game demo demonstrating the system in a Japanese MMORPG (JMMORPG).

Beyond automation, the session highlights key security implications. We explore adversarial input manipulation techniques that could mislead vision-based models, and present a case study involving CVE-2025-32434, where an insecure AI-assisted pipeline is exploited to achieve remote code execution (RCE).

Key Contributions:
1. A functional Python-based AI bot framework for automating gameplay tasks
2. Integration of OpenCV (vision), PyTorch (decision-making), and input control
3. Analysis of adversarial input manipulation risks in AI-driven automation
4. A PoC video demonstrating CVE-2025-32434 exploitation in AI pipelines
5. Recommendations for securing AI-powered game agents and pipelines

Vector Search with Multimodal Embeddings

We will explore the use of embeddings generated from multiple modalities such as text, images, and audio for efficient and accurate search. The session will cover how to create and process multimodal embeddings, index them in a vector database, and perform similarity search operations across diverse data types. We will also discuss various use cases and advantages of using multimodal embeddings in vector search, including improved accuracy in information retrieval and cross-modal connections.

Voice Assistance: How can we develop streaming generative AI?

In this session, we will explore the evolving landscape of voice assistance and its potential when integrated with streaming generative AI. As real-time voice interaction becomes increasingly popular, there is a growing demand for intelligent systems that not only understand but generate content on the fly. This session will delve into the technical aspects of developing streaming generative AI that can power voice assistants, enabling them to create, adapt, and respond dynamically to user inputs in real time.

We’ll cover key concepts such as continuous learning, audio streaming, real-time processing, and how generative models like GPT can be integrated with voice synthesis systems to produce seamless, human-like interactions. We will also discuss challenges around latency, context retention, and accuracy in voice-enabled applications, while providing practical insights for developers looking to build or enhance streaming generative AI systems for voice assistance.

By the end of this session, you will have a deeper understanding of the opportunities and challenges in developing voice assistants powered by generative AI in streaming environments.

Video Session Summarization with ASR-based & Visual Highlight

Video summarization pipeline that converts long-form videos into readable articles. The system integrates Automatic Speech Recognition (ASR) to transcribe audio into text with timestamp that speaker talk (or multi-speaker) from open-source model, applies large language model for summarization and topic segmentation, and uses key-phrase or speaker-based importance detection to extract visual highlights (frames) aligned with the narrative.

Applications
- Automated video blogging or podcast summarization
- Lecture and meeting note generation with visual context

SOTA vs Frontier Models: When agent is not your answer

In the current landscape of software engineering, the rush to integrate AI often leads developers straight toward the giants: Frontier Models like GPT, Claude, etc. While these models are undeniably powerful, many industry projects fall into a "marketing trap," assuming that a massive, generalized LLM-based Agent is the only way to solve business problems.

This session peels back the curtain on the Localization vs. Generalization debate. We will explore why chasing a "specialist" model by forcing a Frontier Model into a narrow box is often inefficient and costly. Instead, we shift the focus back to SOTA (State-of-the-Art) Task-Specific Models. For many production environments, the answer isn't a conversational agent, it’s a precision tool designed for Object Detection, ASR, VQA.

Panel: AI Shaping the Future - Timeless Wisdom Lasting Forever

Should We Adapt AI on Project or Stick to Traditional Methods?

Is AI a 'Must-Have' or a 'Nice-to-Have'? How to evaluate if integrating AI will truly add value to your specific project and for your users.

Your project is working well, and your customers know and trust your solution. But with all the buzz around AI, you're asking the big question: "Is it time for a change?" You see the potential of AI to make your project smarter and more efficient, but you're worried about leaving your loyal customers behind.

This session is designed specifically for you. We'll cut through the hype and focus on the real-world challenges of introducing AI into an established project. We'll explore how to innovate responsibly without disrupting the experience for the people who already rely on you.

Multimodal RAG for Images and Text with MongoDB

Explore how to build a Retrieval Augmented Generation (RAG) system that combines both image and text data. We will use MongoDB as a scalable database to store and retrieve multimodal information, leveraging its capabilities for handling large datasets. The session will involve integrating text and image embeddings, using RAG to generate responses based on retrieved content, and optimizing performance for real-time applications. Attendees will gain insights into using MongoDB as a backend for multimodal retrieval systems.

Best Practices for Building Graph-based RAG from Multiple Documents with Python

This session explores the best practices for building graph-based Retrieval-Augmented Generation (RAG) systems from multiple documents using Python. Participants will learn how to construct document similarity graphs, integrate graph-based retrieval into RAG models, and fine-tune these systems for efficient document search and generation. With hands-on examples using libraries like NetworkX, FAISS, and HuggingFace, attendees will gain practical experience in building scalable and optimized graph-based retrieval systems. By the end of the session, the Python community will acquire new skills in leveraging graphs for more effective information retrieval and generating relevant outputs, providing valuable insights into cutting-edge AI workflows.

From Agentic AI to Physical AI

This session explores the progression from foundational machine learning concepts to the future of embodied artificial intelligence. We begin with the core of machine learning, which learns to predict answers, and move to the role of Large Language Models (LLMs) in understanding questions and responding to diverse instructions.

Building upon this, we examine the AI Agent, which utilizes its understanding of context and history from an LLM to plan and execute actions, either independently or with external tools. The discussion then advances to the next frontier: building AgenticAI to comprehend, plan, simulate, and act within the real-world environment, a concept we define as "PhysicalAI." This emerging field of PhysicalAI opens possibilities for sophisticated simulation and planning to recommend actions or guide a human-in-the-loop.

Accelerating Scientific Discovery with HPC, AI, and Quantum Computing

In this session, discover how the integration of High-Performance Computing (HPC), Artificial Intelligence (AI), and ultimately Quantum Computing can revolutionize research and development in many industries (like chemicals, manufacturing, life sciences and more). Learn how these cutting-edge technologies are going to scale and accelerate R&D efforts that could address some of our world’s toughest challenges.

Witthawin Sripheanpol

AI Researcher

Bangkok, Thailand

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Witthawin Sripheanpol

Actions

Links

Area of Expertise

Topics

Sessions

AI-Powered Online Game Automation and Its Security Implications with Python-Based

Vector Search with Multimodal Embeddings

Voice Assistance: How can we develop streaming generative AI?

Video Session Summarization with ASR-based & Visual Highlight

SOTA vs Frontier Models: When agent is not your answer

Panel: AI Shaping the Future - Timeless Wisdom Lasting Forever

Should We Adapt AI on Project or Stick to Traditional Methods?

Multimodal RAG for Images and Text with MongoDB

Best Practices for Building Graph-based RAG from Multiple Documents with Python

From Agentic AI to Physical AI

Accelerating Scientific Discovery with HPC, AI, and Quantum Computing

Witthawin Sripheanpol

Links

Actions