Simon Whiteley

Data Platform MVP. Databricks Beacon. Cloud Architect, Nerd

London, United Kingdom

Actions

CTO for Advancing Analytics Ltd, Microsoft Data Platform MVP and Databricks MVP. Simon is a seasoned solution architect & technical lead with well over a decade of Microsoft Analytics experience, who spends an inordinate amount of time running the Advancing Spark YouTube series. A deep techie with a focus on emerging cloud technologies and applying "big data" thinking to traditional analytics problems, Simon also has a passion for bringing it back to the high level and making sense of the bigger picture. When not tinkering with tech, Simon is a death-dodging London cyclist, a sampler of craft beers, an avid chef and a generally nerdy person.

Area of Expertise

Information & Communications Technology

Topics

Databricks
Data Engineering
Data Analytics
Big Data
Spark

What's wrong with the Medallion Architecture?

In recent years, companies have seen an explosion in adopting lakehouses - with every analytics developer suddenly rebranding themselves as a Lakehouse Export... but time and time again, we hear from organisations that they regret the layering of their lake, and once it's in, its difficult to change!

Maybe the zones don't quite fit what they were trying to achieve, or no one in the company understands what "silver" vs. "gold" actually means, maybe they had to go back and tack in new layers so we expand into "Diamond", "Platinum" and..."Tin"? We need to tackle a key question: Is the Medallion Architecture right for most businesses - and how should you interpret the advice?

In this session, we'll break down the different stages of data curation and talk about how it works in reality, calling on practical examples from many, many real-world implementations. We'll talk about schema evolution, data cleansing, record validation, and traditional data modelling techniques, layering them on top of our lakehouse zones so we truly understand what happens where.

This session is ideal for data architects, engineers, and analysts looking to design the best platform possible, backed by nearly a decade of Lakehouse development, not a few months on a public preview

Adapting to an AI-Obsessed World as a Data Engineer

The demand for data engineering keeps growing, but data teams are bored by repetitive tasks, stumped by growing complexity, and endlessly harassed by an unrelenting need for speed. To add to that, there's the ever-looming threat of AI swooping in to take people's jobs; but how much of that is just plain hype, and how much could we actually harness AI for real, good engineering?

In this session, we’ll explore how AI is revolutionizing data engineering, turning pain points into innovation. We'll talk about the risks it poses and how we can get ahead of them before they do the damage. We'll run through what you can be doing as a data engineer to stay relevant even as the world goes mad around you. Whether you’re grappling with manual schema generation or struggling to ensure data quality, this session offers practical solutions to help you work smarter, not harder.

Join Simon Whiteley, a deep data engineering expert with a proven track record in avoiding real work through automation as he takes the next step into automating his entire workload. You’ll walk away with a good idea of where AI is going to disrupt the Data Engineering workload, some good tips around how to accelerate your own workflows and an impending sense of doom around the future of the industry!

Beyond the POC: Building Production Apps on Databricks

Every Databricks conference has had a "look, we built a Databricks App!" demo. Twenty minutes, one screen, three buttons, cue polite applause. None of them tell you what happens when you have to handle concurrent users, real volumes of inserts and updates and the operational realities of an app people actually depend on. So how do you actually productionise a Databricks App? How do you handle security? How do you do state management, at scale, without building out a whole Azure app service?

This session walks through what changes when you build a real, stateful, multi-user app on Databricks, and how Lakebase rewrites the architecture you'd otherwise reach for. We start with what Lakebase actually is (Postgres-based OLTP that lives inside your workspace), what it changes about app design, and where it sits in relation to Unity Catalog, SQL Warehouse, and Model Serving. Then we walk a real production app: an MDM platform we built that lets data stewards concurrently manage, match, and merge master data records at scale using Lakebase for the OLTP edit layer and session state, SQL Warehouse for the analytical views, and Model Serving as the inference backbone. We cover the architectural decisions, the security and sync model between Lakebase and UC, the gotchas we hit, and what we'd do differently knowing what we now know.

For data engineering leads, architects, and senior practitioners building (or being asked to build) applications on Databricks who suspect the demo pattern stops short of production reality. You'll leave with a clear-eyed view of what Lakebase actually changes and an architectural blueprint for stateful, multi-user apps on the lakehouse.

Making Metrics Matter - Semantic Models in Databricks

For years, semantic models been simplifying analytics in our quest for self-serve analytical nirvana, but they often felt locked away in old BI tools. Unity Catalog Metric Views bring that promise directly into the Lakehouse, letting teams define consistent metrics once and use them across a range of downstream consumers.

In this session we will trace the evolution of semantic models, then get hands-on with building, extending, and exposing Metric Views. You will see how they connect seamlessly to dashboards, power conversational queries with Genie, and integrate with advanced tools for richer analysis. We'll then loop back and see what we can do with more advanced features and finally, how we can use Metric Views as a template to push to other tools.

Join us to learn how Metric Views make trusted, reusable business logic a first-class part of your data platform, and put Databricks AI/BI on the Analytics Map.

Finding Context in a Sprawling Unity Catalog

Unity Catalog was supposed to solve governance. Two years in, you have governance over an estate nobody understands. You have lineage you can't navigate. Ownership nobody can answer for. Dashboards whose data sources nobody remembers. Then agents arrived, and your estate had to start explaining itself to machines that can't ask follow-up questions. Your agent needs to understand data in context, but you can't even do that yourself. Your Agentic MCP needs an ontology. Your ontology needs you to actually know what's in your estate. So where do you even start?

This session works through the chain from sprawl to agent-ready estate. We start with what UC sprawl actually looks like and why standard discovery tooling plateaus there. We'll discuss how you put semantic meaning over a sprawling estate, what an ontology over a lakehouse actually needs to contain, and where the design decisions bite. Then we'll look at what's in UC to enable agents, how an agent talks to your data estate, what it needs to know in order to succeed, and what breaks when the ontology underneath is incomplete. We'll run through the tools we build to actually enable this at scale.

For data platform leads, architects, and governance owners running Databricks estates that have to serve agents but can't yet serve their own humans. You'll leave with a vocabulary for the sprawl problem, a clear-eyed view of what an ontology over a lakehouse needs to do, and patterns for building the MCP layer your agents will actually need.

Four Tiers of Agentic Engineering on Databricks

You're being told two stories about agentic coding on Databricks, and they both miss the point. Lakeflow Designer says anyone can build pipelines now with just some simple prompts. Genie Code promises to automate building anything and everything. The reality is messier and more interesting: there's a four-tier maturity curve, most teams are stuck at tier two, and the painful gap between "we use Genie Code" and "we've actually built engineering opinion into Genie Code" is where you're going wrong. So where are you on the curve; and where do you think you are?

This session walks the four tiers as a maturity curve: from prompt-by-prompt low-code (Lakeflow Designer), through unharnessed Genie Code, through Genie Code shaped by custom skills and engineering opinion, to full bespoke agentic development frameworks. We'll run through each tier, see what it's good for and where it falls over. We'll spend a lot of the session looking at how you build out an opinionated set of skills, instructions and best practices inside Genie Code, because that's the tier most teams think they're at and almost nobody actually is. We cover what a real skill abstraction looks like, the layered architecture beneath it (Databricks mechanics, opinionated patterns, consulting craft), and what changes when you operationalise this across an engineering team rather than running it solo.

If you're a data engineering leads, architect, or senior practitioners who suspect the agentic tooling story is more complicated than the vendors are letting on, this session is for you. You'll leave with an understanding of what good looks like and concrete patterns for moving up the curve without setting your engineering practice on fire.

Your Wish is AI Command - Get to Grips with Genie

Picture the scene - you're exploring a deep, dark cave looking for insights to unearth when, in a burst of smoke, Genie appears and offers you not three but unlimited data wishes. This isn't a folk tale, it's the growing wave of Conversational BI that is going to be a part of analytics platforms.

Databricks Genie is a tool powered by SQL-writing GenAI that redefines how we interact with data. We'll look at the basics of creating a new Genie room, scoping it's data tables and asking questions. We'll help it out with some complex pre-defined questions and ensure it has the best chance of success. We'll give the tool a personality, set some behavioural guidelines and prepare some hidden easter eggs for our users to discover. We'll tune it up with a values dictionary, defined metrics and hook it up to some business semantics. Finally, we'll stitch several Genie Spaces together using just a few clicks in AgentBricks.

Generative BI isn't going to replace dashboards. But it is going to be a fundamental part of the analytics toolset used across the business. If you're using Databricks, you should be aware of Genie, if you're not, you should be planning your Generative BI Roadmap, and this session will answer your wishes.

Simon Whiteley

Data Platform MVP. Databricks Beacon. Cloud Architect, Nerd

London, United Kingdom

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Simon Whiteley

Actions

Links

Area of Expertise

Topics

Sessions

What's wrong with the Medallion Architecture?

Adapting to an AI-Obsessed World as a Data Engineer

Beyond the POC: Building Production Apps on Databricks

Making Metrics Matter - Semantic Models in Databricks

Finding Context in a Sprawling Unity Catalog

Four Tiers of Agentic Engineering on Databricks

Your Wish is AI Command - Get to Grips with Genie

Simon Whiteley

Links

Actions