Speaker

Raghavan Muthuregunathan

Raghavan Muthuregunathan

Senior Engineering Manager, Linkedin Search AI

San Francisco, California, United States

Actions

Raghavan Muthuregunathan is a senior engineering manager, leading the Linkedin Search AI team. Earlier, he worked at Microsoft Bing.
He is also a volunteer and contributor to
1. LF AI + Data Generative AI commons
2. Apache Solr,
3. United Nations ITU Disaster Management workstream.

Area of Expertise

  • Information & Communications Technology

Topics

  • Artificial Intelligence
  • Information Retrieval
  • Search engines
  • Large Language Models
  • search relevance
  • Apache Solr
  • Machine Learning
  • natural language processing

Translation Augmented Generation - Breaking language barriers in RAG and LLM ecosystems

Large Language Models (LLMs), whether open-source or proprietary, predominantly undergo training using English datasets, resulting in diminished performance when applied to underrepresented or non-English languages. This limitation manifests in various challenges, including non-compliance with instructions, generation of fictitious content, and abrupt termination of responses. While the research community in open-source endeavors explores language-specific fine-tuning and the creation of individual LLMs tailored to each language, these endeavors are prohibitively expensive, often requiring investments in the millions of dollars.

A similar challenge impacts Retrieval Augmented Generation (RAG) systems. The retrieval phase in these systems heavily relies on popular open-source embedding models, which exhibit a bias towards English. To address this issue and enhance the cost-effective deployment of LLMs and RAG systems in non-English contexts, we propose the integration of a translation layer atop the existing RAG framework. This layer would function as a conduit, facilitating the adaptation of the LLM ecosystem and RAG solutions for localized, non-English applications, thereby mitigating the language barrier in a financially sustainable manner.

The benefits to the open-source ecosystem from this talk include:

1. Enhanced Language Inclusivity: The integration of a translation layer with Retrieval Augmented Generation (RAG) systems will enable LLMs to be more effective in multiple languages, not just English.

2. Cost-Effectiveness: The proposed approach offers a financially sustainable solution to the language barrier issue, making it accessible for a wider range of developers and researchers.

3. Broader Accessibility: This strategy allows for the application of LLMs in non-English contexts, opening up the technology to a more diverse user base globally.

Overall, this approach stands to significantly expand the reach and utility of LLMs across different languages, providing a more inclusive and resource-efficient model for language model development.

Raghavan Muthuregunathan

Senior Engineering Manager, Linkedin Search AI

San Francisco, California, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top