B V S N Anjaneyulu Reddy G
Consultant & Solution Architect, Azure Data Platform
consultant and architect specialising in big data solutions on the Microsoft Azure cloud platform.
Data engineering competencies include Azure Data Factory, Data Lake, Databricks, Stream Analytics, Event Hub, IoT Hub, Functions, Automation,Logic Apps and of course the complete SQL Server business intelligence stack.
Many years’experience working within banking and finance, retail and gaming verticals delivering analytics using industry leading methods and technical design patterns.
Data and AI Blue caps ambassador and very active member of the data platform community delivering training and technical sessions at conferences both nationally and internationally.
Area of Expertise
The Microsoft abstraction machine is at it again with this latest veneer over what we had come to understand as the 'modern data warehouse'. Or is it?! When creating an Azure PaaS data platform/analytics solution we would typically use a set of core Azure services; Data Factory, Data Lake, Databricks and SQL Data Warehouse. Now with the latest round of enhancements from the MPP team it seems in the third generation of the Azure SQLDW offering we can access all our core services as a bundle. Ok, so what? Well, this is a reasonable starting point to in our understanding of Azure Synapse, but is also far from the whole story. In this session we'll go deeper into the evolution of our SQLDW to complete our knowledge on why Synapse Analytics is a game changer for various data warehouse architectures. We'll discover what Synapse has to offer with its Data Virtualisation layer, flexible storage and multi model compute engines. A simple veneer of things, this new resource is not. In this introduction to Synapse we'll cover the what, that why and importantly the how for this emerging bundle of exciting services.
The resources on offer in Azure are constantly changing, which means as data professionals we need to constantly change too. Updating knowledge and learning new skills. No longer can we rely on products matured over a decade to deliver all our solution requirements. Today, data platform architectures designed in Azure with best intentions and known good practices can go out of date within months. That said, is there now a set of core components we can utilise in the Microsoft cloud to ingest and deliver insights from our data? When does ETL become ELT? When is IaaS better than PaaS? Do we need to consider scaling up or scaling out? And should we start making cost the primary factor for choosing certain technologies? In this session we'll explore the answers to all these questions and more from an architects viewpoint. Based on real world experience lets think about just how far the breadth of our knowledge now needs to reach when starting from nothing and building a complete Microsoft Azure Data Platform solution.
If you have already mastered the basics of Azure Data Factory (ADF) and you are now looking to advance your knowledge of the tool, this is the session for you. Yes, Data Factory can handle the orchestration of our ETL pipelines. But what about our wider Azure environment? In this session we will take a deeper dive into the service, considering how to build custom activities, create metadata driven dynamic pipelines and think about hierarchical design patterns. Plus, explore ways for optimising our Azure compute costs by controlling other resource scaling as part of our normal data processing pipelines. How? Well, once we can hit a REST API from an ADF web activity anything is possible, extending our Data Factory and orchestrating everything in any data platform solution. All this and more in a series of short lessons (based on real world experience) I will take you through how to use Azure Data Factory in production. Finally, we will look at how Data Factory can be deployed using Azure DevOps.
What happens when you combine a cloud orchestration service with a Spark cluster?! The answer is a feature rich, graphical, scalable data flow environment to rival any ETL tech we’ve previously had available in Azure. In this session we will look at Azure Data Factory and how it integrates with Azure Databricks to produce a powerful abstraction over the Apache Spark analytics ecosystem in the form of Mapping and Wrangling Data Flows. If you have ever transformed datasets using SQL Server Integration Services (SSIS) packages or via Power BI’s Power Query tool this is the session for you. Now we can transform data in Azure using our favourite interfaces but with the support of Azure Databricks doing the heavy lifting. In this session you will get a quick introduction to Azure Data Factory before we go deeper into the services new Mapping and Wrangling Data Flows features. Start using cloud native technology and scale out compute within a convenient, easy to use Data Factory rich graphical interface.
DevOps as a process is here to stay and is typically a must have requirement for any data platform solution. But how does the concept translate to the technology when implemented? Sadly, the answer isn’t always straight forward. In this short session I’ll introduce how we can continuously integrate and deliver our cloud orchestration resource Azure Data Factory (ADF). We’ll discuss three options for getting our service JSON deployed to production using the popular Azure DevOps environment, previously known as VSTS, and think about the suitability of the Microsoft provided ARM templates for our highly dynamic orchestrator. Does the ADF portal UI really support the DevOps methodology? Can we confidently publish our pipelines? How should we handle the branching of our source code for ADF developers? The answers to all these questions and more in this lightning session. All based on real world experience with the products.
It's the buzzword of the year - the "Data Lakehouse", that novel dream of having a modern data platform that gives all the functionality of a data warehouse, but with all of the benefits of a data lake, all in one box.
This action packed session uses Azure Databricks as the core data transformation and analytics engine, augmenting it with Data Factory scheduling and Azure Synapse On-Demand as a serving layer before presenting our data in PowerBI.
It is VERY possible to build a lightweight, scalable analytics platform in a very short amount of time, and I'm going to show you how.
Data Lakes & Parquet are a match made in heaven, but they’re cranked up to overdrive with the new features of Delta Lake. Available as the open source Delta Lake, or the premium Databricks Delta. This session will take a deeper look at why parquet is so good for analytics, but also highlight some of the problems that you’ll face when using immutable columnstore files.
We’ll then switch over to Databricks Delta, which takes parquet to the next level with a whole host of features – we’ll be looking at incremental merges, transactional consistency, temporal rollbacks, file optimisation and some deep and dirty performance tuning with partitioning and Z-ordering.
If you’re planning, currently building, or looking after a Data Lake with Spark currently and want to get to the next level of performance and functionality, this session is for you. Never heard of parquet or delta? You’re about to learn a whole lot more!
the cloud, provides a blazing fast, petabyte-scale system that can handle your most demanding analytical workloads with ease. SQL Data Warehouse takes advantage of the latest Azure hardware technology delivering up to 100x performance boosts on customer workloads. In short, Azure SQL Data Warehouse is a SQL analytics beast!
But how do you make the most of all that power? What do you need to do differently to a traditional server? What's this "Compute Optimised" tier that Microsoft keep talking about?
This session will take you through some of the internal workings of the SQLDW architecture options, share some common design patterns for data warehousing and look at some performance optimisation problems, all based on experience drawn from several large-scale, real-world SQLDW projects for some of the WA’s and India largest Azure users.
Azure Databricks has been around for a while now, and Apache Spark since 2012. You've watched a couple of demos, got a brief overview of what Databricks does and you've got a rough idea of where it fits in… but where do you go from there?
This session is that next stop. We'll start by taking a deeper look inside the spark engine, understanding what makes it tick and how it talks to data. We'll then break down some of the key features that come together to build the kind of data processing task that's changing how we think about ETL.
We'll be looking at:
• Schema Inference
• Metadata Management
• Parameterisation using Widgets
• Integration with ADF
If this is your first foray into Spark or Databricks, it'll be a bumpy ride!
The evolution of spark over the last decade has been incredible, and it's showing no signs of slowing down. The latest generation of Databricks workspace has a whole raft of features that add quality of life to the spark user, whether you're a data engineer, scientist or analyst, there are goodies just for you.
This session with dive into the nuts & bolts of those features, looking at the new Delta Engine which provides lightning fast data processing, coupled with the optimisations that Spark 3.0 brings to the table. We'll talk MLFlow and the machine learning model hosting options that now come bundled inside Databricks, as well as the Next Generation Data Science Project experience. Finally, we'll look at the SQL surface, how analysts can interact directly with the lake and build rich visuals, directly from the browser, or integrate with their favourite BI tools.
If we want to achieve any data processing in Azure we need an umbrella service to manage, monitor and schedule our solution. For a long time when working on premises, the SQL Agent has been our go-to tool, combined with T-SQL and SSIS packages. It’s now time to upgrade our skills and start using cloud native services to achieve the same thing on the Microsoft Cloud Platform. Within a PaaS only Modern Data Warehouse, the primary component for delivering that orchestration is Azure Data Factory, combined with Azure Databricks.
In this full day of training we’ll start with the basics and then get hands on with the tools. We’ll build our own Azure ETL/ELT pipelines using all Data Factory has to offer. Plus, consider hybrid architectures, think about lifting and shifting legacy packages, and explore complex bootstrapping to orchestrate everything in Azure with Data Factory.
Specifically, we’ll be covering the following topics:
- An introduction to Azure Data Factory. What is it and why use it?
- How to extend our orchestration processes with Custom Activities, Azure Functions and Web Hooks.
- Using SSIS packages in Azure.
- Data Factory Data Flows (Mapping & Wrangling) with support from Azure Databricks
- Dynamic metadata driven pipelines.
- Data Factory alerting and monitoring.
- Data Factory DevOps.
- Lessons learnt from using Azure Data Factory in production.
Azure Data Factory has been around since 2015 and has matured massively to form a complete feature rich cloud offering. Add this tech to your development toolbox today! Join me for this complete lesson in everything you need to deliver Azure Data Factory within your data platform solution.
Have your laptops to hand and come armed with your Azure subscription, including credit and rights to deploy resources.
Please check with bill payer.
Infrastructure not included.
Data BlueCaps(WA &Hyd) AI and Data
Global Azure Bootcamp 2019
B V S N Anjaneyulu Reddy G
Consultant & Solution Architect, Azure Data Platform