Speaker

David Colls

David Colls

Head of Data, Product & Platforms at MYOB

Actions

David is an transformative technology leader, with over 25 years’ experience leading the strategic and technical delivery of data and AI, digital strategy and change solutions. David combines diverse experience delivering complex technology solutions with his passion for customer outcomes to develop high performing teams capable of solving complex problems.

When Pipelines Become Sewers - 7 Wastes of Data Production

Even with the most modern tooling, it’s likely that you’re generating waste in the production of data in your organisation. Waste manifests as business misalignment, slow response to opportunities, poor quality of outputs, and employee disengagement. Waste can be any activity that doesn’t deliver value, but where the cause may be hidden.

We’ll review the manufacturing roots of the 7 forms of waste known as Muda in Lean, and how they have been reinterpreted for knowledge work like software engineering. Identifying and managing these wastes is core to modern software delivery and all spheres of business operations. We’ll then consider the data organisation as a factory that produces data, a factory that is constantly reconfigured by engineering as business needs change. This will allow us to identify and characterise the impact of the wastes of data production that emerge in building and running a data organisation and data platform. Initiatives like the DataOps Manifesto and Cookbook also embed this Lean philosophy.

With wastes understood, we’ll identify potential interventions to improve alignment, responsiveness, quality and engagement in data engineering. We will also introduce the Improvement Kata approach that provides a framework that any team can use for continuous improvement. You’ll leave with a good understanding how to reduce waste in data production, in order to restore pristine pipelines.

Semantic hide and seek - a gentle introduction to embeddings with a vector search game

GenAI has thrust embeddings and vector search to prominence, with many new technologies in this space. Embeddings capture some useful semantic - or meaningful - relationships between items. But what does a semantic vector with 100s of dimensions actually _mean_?

Whether you're building a RAG system or just curious, this talk will help build your intuition for how embeddings work, and when they fail.

We’ll explore embeddings for various applications, different types of data, and network architectures. But we'll focus in particular on a game of semantic hide and seek. Fans of the Wordle gamification universe may be familiar with Semantle. In this game, players seek a hidden word. A little like the game of Marco Polo, players “shout” out guesses, and the game “responds” with a semantic similarity score, from (roughly) 0 to 100. Behind the scenes, embeddings are doing all the work.

We’ll see when Semantle does and doesn’t match our intuition for similar meaning. We’ll also explore how biases may be encoded in embeddings. Because we love a puzzle, we’ll compare human and multiple machine-driven strategies for solving Semantle, and their robustness to divergent semantics (with a live demo).

To wrap up, we’ll discuss what all this means for building solutions with the current generation of LLMs.

An version of this talk was first delivered at the Inaugural GenAI Network Melbourne meetup.

David Colls

Head of Data, Product & Platforms at MYOB

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top