

Naman Goyal
Google DeepMind, Previously - NVIDIA, Apple USA
Mountain View, California, United States
Actions
Naman Goyal is a distinguished Machine Learning Engineer and Researcher specializing in Large Language Models (LLMs), Computer Vision, Deep Learning, and Multimodal Learning. With a proven track record at leading technology companies including Google DeepMind, NVIDIA, Apple, and innovative startups, Naman consistently drives advancements in artificial intelligence applications.
At Google DeepMind, Naman plays an instrumental role in developing Deep Research, an AI-powered research assistant integrated within Google Gemini. His contributions focus on enhancing Gemini's reasoning capabilities and optimizing machine learning workflows that serve millions of users globally.
Previously at NVIDIA, Naman optimized machine learning processes for autonomous vehicle development, addressing complex challenges in deploying deep learning models in resource-constrained environments. His work accelerated training cycles and improved the accuracy of self-driving car technology.
During his tenure at Vimaan Robotics, Naman designed computer vision algorithms for AI-powered inventory management solutions, achieving 99.8% inventory accuracy for large-scale warehouses. At HyperVerge Technologies, he developed identity verification systems utilizing facial recognition to protect billions of identities while dramatically reducing processing time from hours to minutes.
Naman holds an M.S. in Computer Science from Columbia University, where his thesis focused on Multi-Modal Learning and NLP. He graduated at the top of his class with a B.Tech. in Computer Science from the Indian Institute of Technology, IIT.
A recipient of prestigious recognitions including the National Talent Search Examination Scholarship and Kishore Vaigyanik Protsahan Yojana Fellowship, Naman has published research papers on on-device NLP applications, graph neural networks, and self-supervised learning approaches for multimodal representation learning.
Naman combines technical excellence with an innovative mindset, consistently delivering high-impact solutions across diverse domains including AI agents, autonomous vehicles, inventory management, identity verification, and AI-powered research assistance. His expertise in developing explainable and responsible AI models emphasizes his commitment to ethical advancement in artificial intelligence.
Area of Expertise
Topics
The Dual Edge of Multimodal AI: Advancing Accessibility While Navigating Bias
Multimodal AI systems—capable of processing text, images, audio, and video simultaneously—present transformative opportunities for accessibility while introducing complex challenges related to bias and fairness. This presentation explores this duality through evidence-based analysis of current implementations and future directions.
For individuals with disabilities, multimodal AI creates unprecedented opportunities: visual recognition systems achieve high accuracy for common objects, real-time speech-to-text transcription operates with minimal error rates, and adaptive learning technologies significantly improve information retention for neurodivergent learners. However, these same systems exhibit concerning bias patterns: recruitment algorithms show substantial ranking disparities across demographics, speech recognition error rates vary considerably across accents, and many images containing gender bias can be traced to problematic relationships between visual elements and text annotations.
The presentation outlines a comprehensive framework for responsible development including: inclusive design principles (with evidence that disability consultants identify many more potential accessibility barriers), representative dataset curation (addressing the reality that images in computer vision datasets rarely include people with visible disabilities), rigorous testing methodologies (conventional sampling typically captures very few users with disabilities), and ethical governance considerations (most AI practitioners want clearer accessibility standards).
Through case studies including image description technologies (showing notable accuracy disparities between Western and non-Western cultural contexts), diverse speech recognition (where community-driven data collection reduced error rates for underrepresented accent groups), and emotion recognition systems (with higher error rates for non-Western expressions), the presentation provides practical insights for developing multimodal AI that enhances accessibility without reinforcing existing inequities.
The Ascendancy and Challenges of Agentic Large Language Models
The development of Large Language Models (LLMs) has shifted from passive text generators to proactive, goal-oriented "agentic LLMs," capable of planning, utilizing tools, interacting with environments, and maintaining memory. This talk provides a critical review of this rapidly evolving field, particularly focusing on innovations from late 2023 through 2025. We will explore the core architectural pillars enabling this transition, including hierarchical planning, advanced long-term memory solutions like Mem0 , and sophisticated tool integration. Prominent operational frameworks such as ReAct and Plan-and-Execute will be examined alongside emerging multi-agent systems (MAS). This talk will critically analyze fundamental limitations like "planning hallucination" , the "tyranny of the prior" where pre-training biases override contextual information, and difficulties in robust generalization and adaptation. We will also discuss the evolving landscape of evaluation methodologies, moving beyond traditional metrics to capability-based assessments and benchmarks like BFCL v3 for tool use and LoCoMo for long-term memory.
Furthermore, the presentation will address the critical ethical imperatives and safety protocols necessitated by increasingly autonomous agents. This includes discussing risks like alignment faking, multi-agent security threats , and the need for frameworks such as the Relative Danger Coefficient (RDC).
Finally, we will explore pioneering frontiers, including advanced multi-agent systems, embodied agency for physical world interaction, and the pursuit of continual and meta-learning for adaptive agents. The talk will conclude by synthesizing the current state, emphasizing that overcoming core limitations in reasoning, contextual grounding, and evaluation is crucial for realizing robust, adaptable, and aligned agentic intelligence.
AI DevSummit 2025 Sessionize Event
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top