Paco Nathan
Principal DevRel Engineer @ Senzing.com
Sebastopol, California, United States
Actions
Paco Nathan leads DevRel for the Entity Resolved Knowledge Graph practice area at Senzing.com and is a computer scientist with +40 years of tech industry experience and core expertise in data science, natural language, graph technologies, and cloud computing. He's the author of numerous books, videos, and tutorials about these topics. He also hosts the monthly "Graph Power Hour!" webinar, and joins Ben Lorica for a monthly AI recap on "The Data Exchange" podcast.
Paco advises Kurve.ai, EmergentMethods.ai, KungFu.ai, DataSpartan, and Argilla.io (acq. Hugging Face), and is lead committer for the `pytextrank` and `kglab` open source projects. Formerly: Director of Learning Group at O'Reilly Media; and Director of Community Evangelism at Databricks.
Links
Area of Expertise
Topics
Catching Bad Guys using open data and open models for graphs to power AI apps
GraphRAG is a popular way to use knowledge graphs to ground AI apps in facts. Most GraphRAG tutorials use LLMs to build graph automatically from unstructured data. However, what if you're working on use cases such as investigations and sanctions compliance -- "catching bad guys" -- where transparency for decisions and evidence are required?
This talk introduces how investigative practices leverage open data for AI apps, using _entity resolution_ to build graphs which are accountable. We'll look at resources such as _Open Sanctions_ and _Open Ownership_, plus data models used to explore less-than-legit behaviors at scale, such as money laundering through anonymized offshore corporations. We'll show SOTA open models used for components of this work, such as _named entity recognition_, _relation extraction_, _textgraphs_, and _entity linking_, and link to extended tutorials based on open source.
Entity-Resolved Knowledge Graphs
Entity resolution (ER) is a complex process focused on connecting structured and semi-structured data sources uses for knowledge graph construction and updates. ER focuses on data quality in ways that support tracking provenance, conducting audits, and providing evidence for key decisions (e.g, in law enforcement, voter registration, etc.) with crucial impact on the quality and trust of downstream graph analytics and AI apps.
This talk shows how to use ER with open data to construct a KG in KùzuDB. We'll focus on linking multiple datasets (beneficial ownership, sanctions, GLEIF, etc.) regarding corporates in the London metro area, then explore hidden relations through graph visualization, chat interaction, and use in Graph-enhanced RAG for LLM-based chatbots.
This example illustrates KG work used in production to investigate _ultimate beneficial owner_ (UBO) and sanctions compliance.
Knowledge Graphs Construction From Unstructured Data Sources
What's a good way to convert text documents into a knowledge graph? According to the current open source libraries for GraphRAG, a dominant notion is: "Just use an LLM to generate a graph automatically, which should be good enough to use." For those of us who work with graphs in regulated environments or mission-critical apps, obviously this isn't appropriate. In some contexts it may even represent unlawful practices, e.g., given US laws regarding data management in some federal agencies.
Let's step back to review the broader practices in knowledge graph construction. For downstream use cases, such as where KGs are grounding AI apps, there's larger question to ask. How can we build KGs from both structured and unstructured data sources, and keep human expert reviews in the loop, while taking advantage of LLMs and other deep learning models?
This talk provides a step-by-step guide to working with unstructured data sources for constructing and updating knowledge graphs. We'll assume you have some experience coding in Python and working with popular open source tools.
The general sketch is to parse the text (e.g., based on `spaCy` pipelines) then use _textgraph_ methods to build a _lexical graph_. We generate a _semantic layer_ atop this, making use of _named entity recognition_, _entity extraction_, and leveraging previous _entity resolution_ work with structured data sources to perform _entity linking_. These steps enrich the semantics for nodes in the graph. Then making use of _relation extraction_ to connect pairs of nodes, we enrich the semantics for edges in the graph. In each steps, we're using LLMs and other deep learning models to augment narrowly-defined tasks within the overall workflows. Using domain-specific resources such as a thesaurus, we'll show how to perform _semantic random walks_ to expand the graph. Finally, we'll show graph analytics to make use of the graph -- tying into what's needed for use cases such as GraphRAG.
Estimated at 45 minutes, not including time for Q&A.
Paco Nathan
Principal DevRel Engineer @ Senzing.com
Sebastopol, California, United States
Links
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top