Applied GenAI in Action: A Shared Framework for Financial AI Evals and Benchmarking

"Can this system be trusted?" is the hardest question to answer for GenAI in finance. General-purpose benchmarks, often limited to the model (like MMLU) fail to capture the nuance of regulated tasks, forcing institutions to build proprietary evaluation silos. This session unveils the current effort within the FINOS AI Evaluation & Benchmarking workstream, as part of our Applied GenAI initiative. We will explore how the community is mapping real-world financial use cases directly to rigorous evaluation metrics.

In particular, the presentation will feature a technical overview of current state-of-the-art evaluation techniques for multi-agent systems leveraging the FinSight Agent, an open-source reference implementation for analyzing corporate earnings calls. This use-case allows us to demonstrate how to execute consistent, reproducible tests using open datasets and synthetic data pipelines.

Discover how your firm can stop guessing and start measuring AI performance against an industry-standard baseline.

Vincent Caldeira

Leading Open Source Technology Innovation for a Sustainable Future

Singapore

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Applied GenAI in Action: A Shared Framework for Financial AI Evals and Benchmarking

Vincent Caldeira

Links

Actions