Anna-Maria Wykes
Microsoft AI MVP | Data & AI RSA (Resident Solution Architect) and Consultant
Bristol, United Kingdom
Actions
Anna is a veteran software and data engineer as well as a Microsoft Data Platform and AI MVP, boasting over 18 years of experience. She has undertaken various projects, including real-time analytics with Scala and Kafka, constructing Data Lakes with Spark, and applying engineering to Data Science. Anna currently serves as a RSA (Resident Solution Architect) and consultant, contributing to the development of Databricks and Neueda's data engineering and AI practice. With a genuine passion for data, she endeavors to bridge the gap between Software Development and Data Science. Anna's other areas of interest include DevOps (DataOps, MLOps, LLMOps), Agile methodologies, and organizing or participating in local Code Clubs.
Links
Area of Expertise
Topics
Delta and Databricks vs SQL Server
Once upon a time, well 1989, we had the Data Warehouse in SQL Server and life was good in the land. It did have its challenges, particularly around loading/storing complex data types as well as the Budget! As data grew larger and more varied, the warehouse became too rigid and opinionated.
In 2012 analytics use cases were growing and Microsoft launched Column Store Index but were very limited however there was talk of a new land with new ideas. In 2013 Databricks started a venture which brought a new approach to data warehouses with the separation of storage and compute. As we no longer needed the controls of a transaction database, this was lost in the changes along with many features like ACID.
Data lakes grew with cheap cloud storage from cloud provider and Databricks became our compute. Things were great in this new land but the same protections were not in place, as ACID had been left behind in the old land. This was the frontiers where life was harsh without consistency or durability where mistakes could cost you.
Times were changing in the old lands of SQL Server, as clustered column store indexes became usable, and the query engine was becoming better at adapting the to the diverse types of queries and making many of the Enterprise features free in 2016.
Things were much more liberal in the new lands as in 2022 Databricks open-sourced the entire code-base, including lots of advanced features that were previously Databricks-only. SQL Server also found its new offering in the cloud which changed its place in our data platform.
This is all great, but how do those who have been using traditional Warehousing tools, in particular SQL Server, make the leap to Delta and Databricks? In this session we will explore this question.
We will do direct comparisons between features/functionality, illustrating how these different tools are ultimately the same and very different at the same time. Finally, we will talk about how these two technologies can be part of the same data platform solution! You could be a DBA, BI Developer or Data Engineer, whatever type of Data Professional you are, this session will compare the differences and help you understand them.
User Group Leaders Panel
Do you want to know what's involved in setting up and running a User Group? Come and meet some individuals who have done just that.
OpenAI vs Azure OpenAI
Have you been asked by your employer to start exploring AI, especially LLMs (Large Language Models)? How could it help your business? How to use ChatGPT? Do you want to better understand AI tooling that will allow you to generate and interpret images? Such as DALL·E and CLIP? Whatever your journey, more and more of us want to better understand OpenAI and what it can offer us.
Whether your thirst for understanding of OpenAI is a result of your day job, or simply a growing interest, the suite of tools available within OpenAI is invaluable. However, how suitable is it to start using these tools professionally? If you are asking yourself this question you are not alone. One of the most common questions emerging from the industry is “What is the difference between OpenAI and Azure OpenAI?”
In this session, we will look at the differences between OpenAI (including ChatGPT Free, Plus and Enterprise) and Azure OpenAI. We’ll look at when best to use each offering, with direct comparisons including security, reliability, training, and support.
By the end of this session, you will have a better understanding of the different serving offerings for OpenAI and have more confidence in when to use each one.
It Takes More Than Just Fancy Flying - Take off to Modern Data Careers
Transitioning into the dynamic realm of modern data professions, encompassing roles like Data Engineering, Analytics Engineering, and AI, presents a wealth of opportunities for professionals from many diverse backgrounds. Whether you've been a Software Engineer, a BI Developer, a Database Administrator (DBA), or just have a passion for data, this panel-style session is ready to help you take off on your next journey.
Join us for an interactive panel discussion that sheds light on the exciting possibilities awaiting those eager to embark on a career in data-driven roles. Our expert panellists will not only explore the role of a Data, Analytics and AI Engineer but also the broader landscape of data professions, offering valuable insights and guidance on the skills and knowledge professionals can leverage when transitioning into these fields.
Throughout the session, we'll delve into various aspects, from data modelling and ETL processes to data pipeline architecture and cloud technologies. We'll unravel the core competencies that are essential in this ever-evolving domain. Drawing from their personal experiences, our panellists will share their own successful journeys, showcasing how individuals from various backgrounds can make the leap into these data professions. It's a career shift where past expertise becomes an asset.
Key Takeaways:
• Gain insights into the diverse roles within the data profession, from Data Engineering to Analytics Engineering and AI.
• Discover how your skills and experience as a Software Engineer, BI Developer, or DBA can be harnessed to thrive in this dynamic field.
• Equip yourself with the knowledge and confidence to step into the world of data professions, unlocking a treasure trove of opportunities.
Whether you're a Software Engineer looking to enhance your data processing skills, a BI Developer seeking to amplify your analytics expertise, or a DBA ready to broaden your data management horizons, this session is your launchpad into a world of possibilities in modern data professions. It's time to embrace the data-driven future!
How to Run Code Clubs for Neurodiverse Children
Code Clubs offer an amazing opportunity to introduce our next generation to coding, with simple brightly colored drag-and-drop tooling to get them started, we are successfully inspiring many to join the tech industry.
However, what about those children who do not feel comfortable walking into an unfamiliar setting, surrounded by strangers, in a very noisy environment? What about those who wouldn’t even think a career in tech would be suited to them? What I am describing is often the neurodiverse children, those with Autism, ADHD, Dyslexia, Dyscalculia. To them a typical Code Club environment can be very intimidating, and sadly, this is effectively excluding a large group of talented young people, many of whom have great potential to work in tech.
Code Clubs are invaluable, and are definitely doing a great job to attract children from different minorities into the industry, but unfortunately, not necessarily those who are neurodiverse. When this became apparent to me, I joined forces with a local Code Club, and a charity for children with special needs, and together we have set up a monthly Code Club that offers a safe, comfortable environment for neurodiverse children.
In this session I want to talk you through my journey setting up a Code Club for neurodiverse children, what I found worked, and what doesn’t. I hope that from this session you will be inspired to follow the same path I have, using your amazing tech experience to empower some of the most vulnerable children, enabling them to become inspired not just by coding, but the tech industry itself.
Getting started with MLOps in Azure
We are being asked more and more to work with various aspects of data, regardless of our core skill set. This is particularly the case when productionising Machine Learning Models.
In this session we will talk about various Azure technologies we can use in our day jobs to achieve this, including ML Studio, Databricks & AKS.
We will look at the various components needed and different architectures that can be implemented, including how to manage Feature Stores and monitoring Model Life Cycles.
When working with data it is vital that different Data Professionals, Software Developers/Engineers & others in tech work in harmony together for successful outcomes. Consequently we will also cover how we can try to best achieve this, and how this has been done in the real world.
DevOps for Databricks using the Databricks SDK
Databricks has recently released an SDK for performing tasks such as creating workspaces and authoring clusters, a long-awaited piece of tooling to make DevOps for Databricks easier. Prior to this, when performing CI/CD DevOps tasks, we have been restricted to using the Databricks CLI or REST API. Both have their merits but there has been a longing for an SDK that will allow for those CI/CD DevOps tasks to be easily written in Python, the most commonly used programming language by Databricks users.
In this session, we will explore the SDK and how it compares to the traditional CLI and REST API. We will look at what we can and can’t do, where we may need to drop into using the traditional methods, and how easy that is to do.
By the end of the session, you will have a better understanding of the SDK, including where its strengths are and what features we still eagerly await.
Delta and Databricks vs SQL Server
Once upon a time, well 1989, we had the Data Warehouse in SQL Server and life was good in the land. It did have its challenges, particularly around loading/storing complex data types as well as the Budget! As data grew larger and more varied, the warehouse became too rigid and opinionated.
In 2012 analytics use cases were growing and Microsoft launched Column Store Index but were very limited however there was talk of a new land with new ideas. In 2013 databricks started a venture which brought a new approach to data warehouses with the separation of storage and compute. As we no longer needed the controls of a transaction database, this was lost in the changes along with many features like ACID.
Data lakes grew with cheap cloud storage from cloud provider and databricks became our compute. Things were great in this new land but the same protections were not in place, as ACID had been left behind in the old land. This was the frontiers where life was harsh without consistency or durability where mistakes could cost you.
Times were changing in the old lands of SQL Server, as cluster column indexes became usable, and the query engine was becoming better at adapting the to the diverse types of queries and making many of the Enterprise features free in 2016.
Things were much more liberal in the new lands as in 2022 Databricks open-sourced the entire code-base, including lots of advanced features that were previously Databricks-only. SQL Server also found its new offering in the cloud which changed its place in our data platform.
This is all great, but how do those who have been using traditional Warehousing tools, in particular SQL Server, make the leap to Delta and Databricks? In this session we will explore this question. We will do direct comparisons between features/functionality, illustrating how these different tools are ultimately the same and very different at the same time. Finally, we will talk about how these two technologies can be part of the same data platform solution! You could be a DBA, BI Developer or Data Engineer, whatever type of Data Professional you are, this session will compare the differences and help you understand them.
The key areas we will explore are:
Optimization
Storage
Compute
Security
Delta & Databricks for the DBA
Once upon a time we had the Data Warehouse, life was good but it had its limitations, particularly around loading/storing complex data types. As data grew larger and more varied, the warehouse became too rigid and opinionated.
So we started using Data Lakes to store data, and tools such as Databricks to do our compute. Things were good, but we missed some of the good times that the Data Warehouse had given us. The lake had become too flexible, we needed stability in our life. In particular, we needed A.C.I.D (Atomicity, Consistency, Isolation, and Durability) Transactions.
Delta Lake, hosted by the Linux Foundation, is an open-source file layout protocol for giving us back those good times, whilst retaining all of the flexibility of the lake. Delta has gone from strength to strength, and in 2022 Databricks finally open-sourced the entire code-base, including lots of advanced features that were previously Databricks-only.
Databricks was developed in 2013: A man named Matei Zaharia, along with colleagues in UC Berkley College, invented a distributed data compute tool called Spark in 2010, and after donating this to the Apache Foundation he created Databricks, a more commercial (paid for) offering of Spark. Both Spark and Databricks have proved invaluable for data processing, and ultimately incredibly popular.
This is all great, but how do those who have been using traditional Warehousing tools, in particular SQL Server, make the leap to Delta and Databricks? In this session we will explore this question. We will do direct comparisons between features/functionality, illustrating how these different tools are ultimately the same and very different at the same time.
Introduction to the wonders of Azure DevOps
Azure DevOps is the leading deployment tool for build and release solutions end to end. It helps you plan your Agile project, manages Git code, and deploys solutions using Continuous Integration (CI) and Continuous Deployment (CD) pipelines.
In this session we will cover some of the core components of Azure DevOps and show you how to implement a secure deployment pipeline, using unit tests and gating with your CI builds and CD releases.
By the end of this session, you will have everything you need to start using Azure DevOps and start building secure deployment pipelines.
Mastering Delta Lakes in Azure
Once upon a time we had the Data Warehouse, life was good but it had its limitations, particularly around loading/storing complex data types. As data grew larger and more varied, the warehouse became too rigid and opinionated.
So we dove headfirst into Data Lakes to store our data. Again, things were good, but missed some of the good times that the Data Warehouse had given us. The lake had become too flexible, we needed stability in our life. In particular, we needed A.C.I.D (Atomicity, Consistency, Isolation, and Durability) Transactions.
Delta Lake, hosted by the Linux Foundation, is an open-source file layout protocol for giving us back those good times, whilst retaining all of the flexibility of the lake. Delta has gone from strength to strength, and in 2022 Databricks finally open-sourced the entire code-base, including lots of advanced features that were previously Databricks-only. This workshop takes you from the absolute basics of using Delta within a Lake, through to some of those advancing engineering features, letting you really master your Delta Lake.
In this workshop we will go from Zero to Hero with Delta, including:
• Handling Schema Drift
• Applying Constraints and Database Designs
• Time-Travel & Management
• Optimize & Performance Tuning
• Streaming
We will also show you how to work with Delta inside and outside of its original home of Databricks.
This training has been designed from our hands-on experience working with Delta and implementing Delta solutions to our clients across the globe. The course is aimed at beginners, and you will leave this course with all the skills you needed to get started on your Delta journey.
The course will be delivered by Microsoft MVP and Databricks Champion working together to bring you the best.
Custom Logging for your Data Solutions in Azure
Within Azure you can connect your data solutions directly to Log Analytics via Azure Diagnostic Settings, but the results are not always as granular as we need.
In this session we will explore how Custom Logging can be applied within your data solutions and written to Log Analytics, accompanying what Azure already gives us out of the box. We will also look at core logging concepts including Structured Logging, Alerting, and how they can enrich your data processes, providing better insights and visualisation.
We will write our code in Python, Log to Log Analytics and explore and act upon our logs using Azure Monitor and Azure Data Studio
Getting started with MLOps in Azure
We are being asked more and more to work with various aspects of data, regardless of our core skill set. This is particularly the case when productionising Machine Learning Models.
In this session we will talk about various Azure technologies we can use in our day jobs to achieve this, including ML Studio, Databricks & AKS.
We will look at the various components needed and different architectures that can be implemented, including how to manage Feature Stores and monitoring Model Life Cycles.
When working with data it is vital that different Data Professionals, Software Developers/Engineers & others in tech work in harmony together for successful outcomes. Consequently we will also cover how we can try to best achieve this, and how this has been done in the real world.
So you want to be a Data Engineer?
Being Data Engineers, we think it’s a cool profession to work in. It's also becoming one of the most in-demand skill-sets across industries and sectors.
In this Session - Anna, Mikey and Ust will introduce you to the role of a Data Engineer; some of the technologies and tools used within the discipline; before guiding you to resources that will help you learn the basics and further develop your expertise.
We’ll share parts of our own journeys to becoming Data Engineers, and how you can use your existing experience to transition from careers such as being a DBA, Software Engineering or just having a passion for data and problem solving.
DevOps for Databricks
Applying DevOps to Databricks can be a daunting task. In this talk this will be broken down into bite size chunks. Common DevOps subject areas will be covered, including CI/CD (Continuous Integration/Continuous Deployment), IAC (Infrastructure as Code) and Build Agents.
We will explore how to apply DevOps to Databricks, primarily using Azure DevOps tooling, but we will also explore the alternatives, including Github Actions. As a lot of Spark/Databricks users are Python users, we will look at the Databricks Rest API (using Python) to perform CI/CD tasks. For IAC we will primarily look at Terraform, but also explore other options, including ARM templates, Azure Bicep and Pulumi
As data professionals come with a variety of different backgrounds and skill sets, this talk will focus on providing options, and live demos, that demonstrate ways of achieving a DevOps solution that can be understood and maintained by everyone
Automate the deployment of Databricks components using Terraform
Databricks is a great data analytics tool for data science and data engineering, but provisioning Databricks resources (workspace, clusters, secrets, mount storage etc.) can be complex and time consuming.
Automating deployment of Databricks resources has been tricky in the past using Terraform an Infrastructure as Code tool. It has required using mix of Terraform Azure providers and/or ARM, PowerShell, Databricks CLI or REST APIs. This made it harder to repeat and caused inconsistent environments.
Databricks introduced its own Terraform provider to assist with deploying and managing Databricks resources into Azure, Google (GCP) and Amazon Web Services (AWS) cloud platforms. Giving the ability to automate deployment of Databricks resources at the time of provisioning the infrastructure, making it easier to manage and maintain.
This session will be introducing you to Terraform, Databricks provider and take you through the steps required to build an automated solution to provision Databricks workspace and resources into Azure cloud platform using Terraform.
By the end of this session, you will have everything you need to automate your Databricks environments deployments and ensure consistency.
Achieving DevOps Nirvana: Automating Azure Data Platform Deployments with Terraform
Adopting full Infrastructure as Code (IaC) can be a daunting task, not always accessible to every data developer, given the variety in experience and skill-set. It is important we work towards the DevOps dream of us all being part of the process , and all being responsible for and understanding our solutions infrastructure – but how do we achieve this dream?
Terraform is a highly popular and easy to learn.IaC solution that simplifies the deployment process. Terraform can be used with all the major cloud providers: Azure, AWS & GCP. Also, specialist analytics tools such as Databricks have introduced their own Terraform providers to assist with deploying and managing resources into all major cloud providers.
In this workshop you will be introduced to Terraform, and learn its core concepts and components. We will then focus on designing and deploying an Azure Data Platform solution, including a Resource Group, Key Vault, ADLS (Azure Data Lake Store), Synapse and Databricks.
Once we have our solution, we will runn our Terraform via a DevOps CI/CD (Continuous Integration/Continuous Deployment) pipeline. Finally, we will cover some of the most common security and networking challengesWe then finish with best practice guidelines and comparisons with other popular IaC solutions
Join Anna Wykes and develop the core knowledge you need to work with Terraform for your Azure Data Platform solution(s), along with transferable Terraform skills that can be used with other Cloud Providers
Scala for Big Data: The Big Picture
What is the big deal with scala? What are the benefits of using it? You may have heard scala is a favoured language for working with big data, but at the same time avoided due to complexity, and thus large learning curve. Other languages, such as python provide more “out of the box” libraries to achieve common data tasks, especially in popular areas such as Machine Learning
In this talk you will be given a guide scala via Azure Databricks. With demonstrations of its strengths, weaknesses and real-world scenarios in which it can kick ass. This session will provide you with the building blocks to start working with scala, and aid in understanding scenarios when it can prove to potentially be the better tool for the job
Mastering MLOps in Azure
Software Developers/Engineers are being asked more and more to work with various aspects of data, in particular the productionisation of Machine Learning Models. In this session we will talk about various Azure technologies we can use in our day jobs to achieve this, including how to manage Feature Stores and monitoring model life cycles, plus we will discuss the different architectures that can be implemented.
"85% of big data projects fail" (Gartner, 2017). "87% of data science projects never make it to production" (VentureBeat, 2019). "Through 2022, only 20% of analytic insights will deliver business outcomes" (Gartner, 2019). When working with data it is vital that different Data Professionals and Software Developers/Engineers work in harmony together for successful outcomes, consequently we will also cover how we can try to best achieve this and how this has been done when working with clients in our day job.
The session will be delivered by myself, a Data Engineering Consultant with a background in Software Engineering, and my colleague Luke Menzies, who is a Data Science Consultant
Data Relay 2023 Sessionize Event
SQLBits 2023 - General Sessions Sessionize Event
SQLBits 2023 - Full day training sessions Sessionize Event
Global AI Bootcamp - London 2023 Sessionize Event
Southampton Data Platform and Cloud user group - in-person meetup User group Sessionize Event
Dativerse #2 Sessionize Event
Data Relay 2022 Sessionize Event
DataGrillen 2022 Sessionize Event
SQLBits 2022 Sessionize Event
Data Relay 2019 Sessionize Event
Anna-Maria Wykes
Microsoft AI MVP | Data & AI RSA (Resident Solution Architect) and Consultant
Bristol, United Kingdom
Links
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top