Co-Founder & CTO of Cloud Formations | Microsoft MVP
Derby, United Kingdom
Paul (AKA @mrpaulandrew) is the Founder & CTO of Cloud Formations, a specialist data consultancy based in the UK. With nearly 20 years’ experience designing and delivering Microsoft data architectures, Paul leads a passionate team of engineers, supporting businesses small and large with scalable cloud platforms. Business value delivered through data insights. Over the years, Paul has covered the breadth and depth of design patterns and industry leading concepts, including Lambda, Kappa, Delta Lake, Data Mesh and Data Fabric.
Paul is also a Microsoft Data Platform MVP, director for the Data Relay community conference, East Midlands user group leader, book author and mentor. In addition to the day job(s), Paul is a father of three, husband, foodie, runner, blood donor, geek, Lego, and Star Wars fan! Lastly, Paul confesses to enjoying a Ramstein playlist when given half a chance to do some coding for a customer project.
Area of Expertise
All this talk about Data-Ware-Lake-Delta-Beach-House-Lakes (or some combination of that) and Data, Yarn, Fabric integration, everything has got a bit… Meshy! Yes, my friends. The beat of the technology drum is certainly relentless. And with no limits cloud scale and huge innovations from the biggest brains. Two years, it seems, has become the benchmark for tools to live and die by. Reach three years and you almost have a mature product. That said, Microsoft Fabric, the latest offering from global software giant is no exception. But what does this mean for the real world. For the data analysts, engineers and scientists that need to continue answering everyday problems to inform business decisions. In this session we will firmly ignore the hype and focus on the reality. With the pragmatic view of an experienced architect. The problem of gaining insights from our data hasn’t changed. So, what does this mean if implemented using Microsoft Fabric. What, why and how is the tooling going to change our daily deliverables in the short term, medium and long term. Join me for these answers and more as we explore the impact of Microsoft Fabric-Server, erm, Power. Resource. Thing!
The principals of a data mesh architecture have been around for a while now, but we still don’t have a clear way to deliver such a platform in Azure. Are the concepts so abstract that it’s hard to translate the principals into real world requirements and maybe even harder to think about what technology you might need to deploy within your Azure tenant.
In this session, we’ll explore options for building scalable data products in Azure, following Data Mesh architecture principals. Turning the theory into practice. What data storage technology should be used? Does it matter? What endpoints should be exposed for the products across the overall mesh? And what resource(s) should sit at the centre of the Data Mesh? Answers to all these questions are more as we turn the theory of a Data Mesh architecture into practice. Including, how to dissect the planes of the Data Mesh using Azure concepts.
In this full day training data session, we'll quickly cover the fundamentals of data integration pipelines before going much deeper into our Azure resources (Data Factory & Synapse Pipelines). Within a typical Azure data platform solution for any enterprise grade data analytics or data science workload an umbrella resource is needed to trigger, monitor, and handle the control flow for various workloads, with the goal being actionable data insight. Those requirements are met by deploying Azure Data Integration pipelines, delivered using Azure Synapse Analytics or Azure Data Factory. In this session, we will explore how to create rich, dynamic, metadata driven pipelines and apply these orchestration resources in production. Using scaled out architecture design patterns, best practice, data mesh principals, and the latest open source frameworks. We will take a deep dive into the resources, considering how to build custom activities, complex pipelines and think about hierarchical design patterns for enterprise grade deployments. All this and more in a complete set of learning modules, with hand on labs, we will take you through how to implement data integration pipelines in production and deliver advanced orchestration patterns (based on real world experience).
Maintaining a functional set of knowledge on the breadth and depth of Azure Data Platform resources is hard. There is now so many different ways to execute many different data processing workloads on many different flavours of compute and storage. What should we use, and when? It depends is common answer! However, in this full data of training help is at hand. We will cover the A-Z of (data engineering focused) Azure Data Resources, yes, its depends, but we'll go deeper and learn what it depends on. From the perspective of an experienced solution architect and based on real world implementations, we'll address what to use, when to use it, why and how. Including tips and tricks for the deployment of resources into production along the way. To support this understanding we'll cover a set of use case driven scenarios and the various resources/architecture patterns used to implement them.
How has advancements in highly scalable cloud technology influenced the design principals we apply when building data platform solutions? Are we designing for just speed and batch layers or do we what more from our platforms, and who says these patterns must be delivered exclusively? Let’s disrupt the theory and consider the practical application of all things. Can we now utilise Azure technology to build architectures that cater for lambda, kappa and delta concepts in a complete stack of services? And should we be considering a solution that offers all these principals in a nirvana of data insight perfection? In this session we’ll explore the answer to all these questions and more in a thought provoking, argument generating look at the challenges every data platform architect faces.
Azure Data Factory and Synapse Integration Pipeline are the undisputed PaaS resources within the Microsoft Cloud for orchestrating data workloads. With a 100+ Linked Service connections, a flexible array of both control flow and data flow Activities there isn't much these pipelines can’t do as a wrapper over our data platform solutions. That said, the service may still require the support of other Azure resources for the purposes of logging, monitoring, compute and storage. In this session we’ll will focus on exactly that point and explore the problem faced when structuring many integration pipelines in a highly scaled architecture.
Once coupled with other resources, we’ll look at one possible solution to this problem of pipeline organisation to create a dynamic, flexible, metadata driven processing framework that complements our existing solution pipelines. Furthermore, we will explore how to bootstrap multiple orchestrators (across tenants if needed), design for cost with nearly free Consumption Plans and deliver an operational abstraction over all our processing pipelines.
Finally, we'll explore delivering this framework within an enterprise and consider an architect’s perspective on a wider platform of ingestion/transformation workloads with multiple batches and execution stages.
The resources on offer in Azure are constantly changing, which means as data professionals we need to constantly change too. Updating knowledge and learning new skills. No longer can we rely on products matured over a decade to deliver all our solution requirements. Today, data platform architectures designed in Azure with best intentions and known design patterns can go out of date within months. That said, is there now a set of core components we can utilise in the Microsoft cloud to ingest, curation and deliver insights from our data? When does ETL become ELT? When is IaaS better than PaaS? Do we need to consider scaling up or scaling out? And should we start making cost the primary factor for choosing certain technologies? In this session we'll explore the answers to all these questions and more from an architect’s viewpoint. Based on real world experience let’s think about just how far the breadth of our knowledge now needs to reach when starting from nothing and building a complete Microsoft Azure Data Analytics solution.
The principals of a data mesh architecture have been around for a while now, but we still don’t have a clear way to deliver such a solution in Azure. Are the concepts so abstract that it’s hard to translate the principals into real world requirements and maybe even harder to think about what technology you might need to deploy in your Azure resource groups. In this session, we’ll explore options for building an Azure data platform, following Data Mesh principals. What data storage technology should be used? What endpoints should be exposed for mesh interfacing and what resource(s) should sit at the centre of the Data Mesh? Answers to all these questions are more as we turn the theory of a Data Mesh architecture into practice.
For those that have been using Azure data platform resources for a while the unified Synapse Analytics Workspace experience makes a lot of sense. However, for those that are new to Azure translating the technology requirements to a given use case can be hard. In this short sharp session, we’ll look at what each of the Synapse Analytics tools can do for our data workloads. We’ll decrypt the workspace experience into simple compute and storage components regardless of how you choose to ‘develop’ or ‘integrate’ your data. Let’s remove the pretty UI abstraction, what is the technology I’m working with underneath.
Once upon a time, there was a data warehouse and it lived happily as a set of tables within our relational database management system (RDMS) called Microsoft SQL Server. The data warehouse had three children known as extract, transform, and load. One day a blue/azure coloured cloud appeared overhead, and it started to rain. The data warehouse got wet and was never the same again! Or was it? Spoiler alert, the data warehouse is the same, still happy, and well, it just evolved and moved from its RDMS home to a new home in the cloud. The end!
In this session, we'll look at the evolution of the data warehouse and understand how we can now deliver the same data engineering concepts for our solutions on the Microsoft Azure cloud platform using the open-source Delta.io standard. We'll introduce the standard (originally developed by Databricks) and then explore the implications it has for our next-generation cloud data warehouse.
The original data warehouse set of tables remain, but now they are delivered using the cloud-native Delta Lake technology with distributed storage/compute as standard. Delta.io gives us those much-needed ACID properties over our data lakes meaning our data warehouse understanding can move to the cloud and is made easier within Azure. The data warehouse just grew up and became a Delta Lake-House.
Azure Data Factory along with other Integration Pipeline technologies is now a core resource for any data platform solution, offering critical control flow and data flow capabilities. In this session we’ll take and end-to-end look at our Azure based data pipeline tools when orchestrating highly scalable cloud native services. In this complete introduction session, we will cover the basics of Azure Data Factory and Azure Synapse Analytics Pipelines. What do we need to build cloud ETL/ELT workloads? What’s the integration runtime? Do we have an SSIS equivalent cloud data flow engine? Can we easily lift and shift existing SSIS packages into the cloud? The answers to all these questions and more. Come to this session knowing nothing about Azure Data Integration Pipelines and leave with enough knowledge to start building pipelines tomorrow.
The Microsoft abstraction machine is at it again with this latest veneer over what we had come to understand as the ‘modern data warehouse’. Or is it?! When creating an Azure PaaS data platform/analytics solution we would typically use a set of core Azure services; Data Factory, Data Lake, Databricks and maybe SQL Data Warehouse. Now with the latest round of enhancements from the MPP team and others it seems in the third generation of the Azure SQLDW offering we can access all these core services as a bundle. We might even call it a Kappa architecture! Ok, so what? Well, this is a reasonable starting point in our understanding of what Azure Synapse Analytics is, but it is also far from the whole story. In this session we will go deeper into the evolution of our SQLDW to complete our knowledge on why Synapse Analytics is a game changer for various data platform architectures. We’ll discover what Synapse has to offer with its Data Virtualisation layer, flexible storage, and variety of compute engines. A simple veneer of things, this new resource is not. In this introduction to Synapse we will cover the what, that why and importantly the how for this emerging bundle of exciting technology. Finally, we’ll touch on Microsoft’s latest thinking for a HTAP environment with direct links into our transactional data stores.
Within a typical Azure data platform solution for any enterprise grade data analytics or data science workload an umbrella resource is needed to trigger, monitor, and handle the control flow for transforming datasets. Those requirements are met by deploying Azure Data Integration pipelines, delivered using Synapse Analytics or Data Factory. In this session I'll show you how to create rich dynamic data pipelines and apply these orchestration resources in production. Using scaled architecture design patterns, best practice and the latest metadata driven frameworks. In this session we will take a deeper dive into the service, considering how to build custom activities, dynamic pipelines and think about hierarchical design patterns for enterprise grade deployments. All this and more in a series of short stories (based on real world experience) I will take you through how to implement data integration pipelines in production.
In this full day of training, we’ll start with the very basics, learning how to build and orchestrate common pipeline activities. You will learn how to build out Azure control flow and data flow components as dynamic processing pipelines using Azure Data Factory and Azure Synapse Analytics. We’ll start by covering the fundamentals within the resources and together build a set of pipelines that ingest data from local source systems, transform and serve it to potential consumers. Through a set of 12 carefully constructed learning modules we will take an end-to-end look at our Azure integration pipeline tools as part of highly scalable cloud native architectures, dealing with triggering, monitoring, dynamic pipeline content as well as CI/CD practices. Start the day knowing nothing about Azure Data Integration pipelines and leave with the knowledge, slides, labs, demos, and code to apply these resources in your role as a data professional. Everything delivered will be use case orientated and grounded in real world experience.
Data Relay 2022
Global Azure Bootcamp 2019
Co-Founder & CTO of Cloud Formations | Microsoft MVP
Derby, United Kingdom