Speaker

Henry Ruiz

Henry Ruiz

Research Scientist at Texas A&M AgriLife Research, GDE in AI and Cloud

College Station, Texas, United States

Actions

Dr. Henry Ruiz is a Research Scientist at Texas A&M AgriLife Research, specializing in Artificial Intelligence (AI) and Remote Sensing. His work focuses on the development of advanced software systems and computational algorithms for analyzing multi-source remote sensing data, including satellite imagery, UAVs (Unmanned Aerial Vehicles), LiDAR (Light Detection and Ranging), and Ground Penetrating Radar (GPR).
As a Google Developer Expert in AI and Google Cloud, Dr. Ruiz actively contributes to the AI and open-source communities through research, software development, public speaking, and mentorship. With more than a decade of experience as a researcher, full-stack software developer, and data scientist, he drives innovation at the intersection of AI, geospatial analytics, and scientific computing.
His current research focuses on the emerging capabilities of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs), particularly their abilities in reasoning, planning, and agent-based collaboration, to enhance analytical pipelines for remote sensing and geospatial data analysis. By integrating multi-agent systems, Retrieval-Augmented Generation (RAG), and multimodal AI workflows, he is developing intelligent, scalable frameworks to address complex scientific, agricultural, and environmental challenges.

Area of Expertise

  • Media & Information

Topics

  • Machine Learning & AI
  • Computer Vision
  • Deep Learning
  • Machine Learning
  • Data Science
  • Remote Sensing

UISurf: Toward Universal UI Automation with Cross-Environment Agents

In this talk, I will introduce UISurf, an open-source multimodal agentic UI automation platform that enables AI agents to perceive, reason, and collaborate across browser and desktop environments to complete end-to-end tasks involving multiple user interfaces.

UISurf was designed to provide a secure and extensible playground for developing, evaluating, and testing UI automation agents within isolated environments. The platform enables researchers and developers to study how multimodal agents can interpret user interfaces, coordinate actions, and execute complex workflows safely across both web and desktop applications.

UISurf consists of three primary components: uisurf-agent, the runtime responsible for UI automation agents; uisurf-admin, the session orchestration and management service; and uisurf-app, the full-stack user application. Its multi-agent architecture includes a planning_agent that transforms natural-language requests into structured execution plans; specialized Browser and Desktop Agents for environment-specific interactions; an automation_agent that coordinates execution and inter-agent handoffs through Agent-to-Agent (A2A) communication; and a summarization_agent that generates the final task summary for the user.

UISurf supports both fully autonomous execution and human-in-the-loop supervision, providing a practical and extensible framework for studying, benchmarking, and deploying cross-environment UI automation systems powered by multimodal AI agents.

Tensorflow Everywhere ( Workshop )

Model deployment is perhaps the most important step in the ML cycle. We have spent a lot of time and effort playing around with different algorithms, training, and tuning our model parameters, so after evaluating its performance and obtaining that long-awaited score, it is time to release it and show our model to the world. It sounds like graduation time!!, right?. However, the statistics show that 60% of the models never make it out into production, mainly because moving the model into a production environment is not simple and requires extra skills.

In this workshop, we will explore different deployment scenarios to release our ML models and learn how easy it is to move them to production using GCP(Google Cloud Platform).

End-to-End computer vision projects

In this talk, I'll cover how was the development process of Tumaini, a mobile application that uses artificial intelligence (AI) to detect pests and diseases affecting banana. I'll be discussing the architecture of the app, how the dataset was created, and the model deployment on the device. https://doi.org/10.1186/s13007-019-0475-z

Modern Deep Learning Workshop: From transformers to LLMs

This beginner-friendly workshop provides an introduction to the fundamentals of generative AI, including topics such as Transformers, GANs (Generative Adversarial Networks), Diffusion Models, Reinforcement Learning from Human Feedback, and Large Language Model (for short LLMs).
Hands-on demos and guided tutorials will allow you to get on top of these #hottopics in tech, allowing you to leverage your (tech) career. Familiarity with Python knowledge is recommended but not mandatory.

Let's embark on the Deep Learning journey together!

Automating your ML pipelines using Kubeflow and Vertex AI

This workshop will delve into the world of automating machine learning workflows using Kubeflow and Vertex AI. Using these powerful tools, participants will learn how to streamline their ML pipelines, from data preparation to model deployment. By the end of the session, attendees will have a solid understanding of how to leverage Kubeflow and Vertex AI to enhance their ML development process and increase productivity. Kubeflow is an open-source platform that simplifies the deployment of machine learning workflows on Kubernetes, while Vertex AI is Google Cloud's unified ML platform. Together, these technologies enable data scientists and ML engineers to build, deploy, and manage ML models at scale with greater efficiency and reproducibility.

GemmaEarth: A TPU-Native Framework for Adapting Gemma for Earth Observation Tasks

Earth observation has no shortage of data. What it still lacks are practical, scalable workflows for adapting modern multimodal foundation models to geospatial tasks without turning experimentation and training into a months-long infrastructure effort. In this talk, we introduce GemmaEarth, an open-source post-training and benchmarking framework designed to adapt Google’s Gemma 3 4B IT model for Earth Observation (EO) understanding using a TPU-native JAX ecosystem.

GemmaEarth leverages tools such as Tunix, Grain, Optax, Orbax, and Qwix to provide a scalable and reproducible workflow for parameter-efficient fine-tuning on Google Cloud TPU v5litepod-8. Using the EarthDial dataset as an initial benchmark, the framework focuses on multi-label satellite scene classification while establishing a flexible foundation for broader EO applications, including multimodal reasoning, scene understanding, and geospatial analysis.

The session will present the end-to-end workflow behind GemmaEarth, covering dataset preparation, LoRA-based post-training, distributed TPU experimentation, evaluation, benchmarking, and deployment considerations. Attendees will gain practical insights into adapting open multimodal models for remote sensing tasks with the JAX ecosystem and Google Cloud TPUs, as well as lessons learned from building reproducible and scalable EO-focused AI pipelines for both research and production environments.

Multimodality with Gemini: Unleashing the Power of Text, Videos, Images and more

Gemini is the most capable and general model Google has ever built. It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across, and combine different types of information, including text, code, images, and video. This talk dives into the exciting world of Gemini, a cutting-edge foundation model developed by Google. Discover how Gemini seamlessly integrates text and image processing, enabling you to:

- Analyze and understand the content of images, videos, and audio files
- Perform cross-modal tasks like image captioning and visual question-answering
- Explore the potential of multimodality for various applications, from creative content generation to advanced information retrieval.

Additionally, we'll delve into the core techniques that make LLMs multimodal, including contrastive learning and LIMoE—Learning Multiple Modalities with One Sparse Mixture-of-Experts Model. Learn more here: https://research.google/blog/limoe-learning-multiple-modalities-with-one-sparse-mixture-of-experts-model/

Join us to unlock the power of Gemini and push the boundaries of AI!

Unleash Generative AI Power in our Hands-on Workshop!

This beginner-friendly workshop will introduce the fundamentals of generative AI and cover some advanced topics, including Transformers, GANs (Generative Adversarial Networks), Diffusion Models, Reinforcement Learning from Human Feedback, and large language models (for short LLMs). Hands-on demos on Gemini and langChain APIs will be shared to help attendees better understand and stay on top of these hot topics in ML.

1. Generative AI foundations: from transformers to LLMs
Google generative AI APIs
2. Introduction to Gemini API
3. Multimodality with Gemini: Unleashing the Power of Text, Audio, Videos, Images, and More
4. Multi-agents applications using Vertex AI reasoning engine and agents builder

LLM Applications components and design patterns

This workshop will focus on the design patterns and essential components necessary for developing applications using large language models (LLMs). It will also cover best practices for integrating LLMs into our applications, highlighting the importance of the context window, modular design, scalability, and maintenance. Participants will acquire practical knowledge on developing LLM applications, including chat applications, retrieval-augmented generation (RAG) systems, and agent-based tools.

Workshop: Developing a Multimodal Chat that can generate images using Gemini and Imagen

This workshop will explore the exciting intersection of multimodal AI and image generation, focusing on two powerful models: Google's Gemini and Imagen. Participants will learn how to leverage these cutting-edge technologies to create a chat interface capable of understanding and generating text and images. By the end of the session, attendees will have hands-on experience integrating these models into a functional multimodal chat application.

Building Multi-Agent applications with Gemini & ADK (Agent Development kit)

This talk explores the transformative capabilities of Large Language Models (LLMs) like Gemini in powering intelligent agents, beginning with an overview of their advanced reasoning abilities shaped by post-training techniques. We delve into how context, memory, and augmentation strategies such as Retrieval Augmented Generation (RAG) and function calling expand LLM functionalities, enabling them to become effective agents that can reason and act. The session will further discuss the evolution from single to multi-agent systems for tackling complex problems, differentiate between predefined workflows and dynamic agent behaviors, and introduce the Agent Development Kit (ADK) as a practical framework for designing and implementing these sophisticated AI systems, culminating in a demonstration of ADK in action.

Henry Ruiz

Research Scientist at Texas A&M AgriLife Research, GDE in AI and Cloud

College Station, Texas, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top