Most Active Speaker

John Miner

John Miner

Insight Digital Innovations

Providence, Rhode Island, United States

Actions

John Miner is a Senior Data Architect at Insight Digital Innovation helping corporations solve their business needs with various data platform solutions.

He has over thirty years of data processing experience, and his architecture expertise encompasses all phases of the software project life cycle, including design, development, implementation, and maintenance of systems.

His credentials include undergraduate and graduate degrees in Computer Science from the University of Rhode Island. Also, he has earned certificates from Microsoft for Database Administration (MCDBA), System Administration (MCSA), Data Management & Analytics (MCSE) and Data Science (MPP).

John has been recognized with the Microsoft MVP award seven times for his outstanding contributions to the Data Platform community.

When he is not busy talking to local user groups or writing blog entries on new technology, he spends time with his wife and daughter enjoying outdoor activities. Some of John’s hobbies include wood working projects, crafting a good beer and playing a game of chess.

Awards

  • Most Active Speaker 2024
  • Most Active Speaker 2023
  • Most Active Speaker 2022

Area of Expertise

  • Energy & Basic Resources
  • Information & Communications Technology

Topics

  • Azure SQL Database
  • Azure SQL Synapse
  • SQL Server Developer
  • Azure Data Factory
  • Azure Data Platform
  • Azure Data Lake
  • Data Platform
  • Azure Databricks
  • Microsoft Data Platform
  • Data Warehousing
  • Data lake architecture
  • Fabric Data Warehouse
  • Fabric Lakehouse
  • Power Bi Reporting
  • Power BI / Fabric

Control your data lake with Databricks SQL Warehouse

Many companies are migrating their data to the cloud. Files are typically classified as containing unstructured, semi structured, and structured data. A data warehouse is best suited to working with structured data while a data lake can store all three. What if you can use one product to work with all three types of data?

Azure Databricks SQL Warehouse is an interface that makes the spark cluster look like a SQL database. In general, the syntax is compliant with the core ANSI SQL language. However, additional features are available for working with remote storage, loading files into delta tables, and managing the hive catalog. Last but not least, SQL queries can be visualized and combined into dashboards.

At the end of this talk, you will have a good understanding of the Azure Databricks SQL Warehouse ecosystem.

Data Engineering with Microsoft Fabric

Microsoft Fabric went to general availability in November 2023. The one lake concept focuses on using delta tables to create medallion zones for your data. How do you extract, load, and transform data in Fabric?

The good news is that Fabric is just a re-organization of tools that are already familiar with. To get data into the one lake you can use Azure Data Factory or Shortcuts. To transform data, you can use Data Flows or Spark notebooks. Finally, the SQL Analytic endpoint can be used to present the data to the end users.

Since this is a data engineering talk, there is no time to talk about reporting and governance. However, Power BI and Purview are available for your usage. We will be working with several known datasets: Sales LT, S&P 500 Stocks and Pubs

In short, this presentation is a must for data engineers that are tasked with Fabric development.

Building a reporting warehouse using Fabric

Many companies have legacy databases that are key to their day to day transaction processing. How can we consolidate data from various relational database management systems and/or sources into a single pane of glass for reporting?

Metadata driven data pipelines can be used to bring in the data into a raw schema. For small to medium tables (files), a full load pattern can be used. For larger tables (files) an incremental load pattern reduces the volume of the data that is transferred.

Segregation of the medallion data quality zones is key to security and management. The bronze layer uses the raw schema; the silver layer composes views over two staging schemas (active and inactive).; and the gold layer is typical flattening of the silver views for reporting.

The loading of data into the inactive schema allows the developer to rebuild the silver tables without impacting the production reports. Once all tables in the inactive schema are refreshed, a stored procedure can repoint the views so that the new data will suddenly appear. Of course, a control table will be needed to keep track of which schema is currently active.

The Fabric Warehouse has been re-written using the one lake delta file format. While mirroring can be used for modern data sources, older databases that are considered technical debt are not supported. This design pattern will support any data source that is supported by the copy activity.

What's new in Azure Databricks?

The Azure Databricks ecosystem has been around for about five years. During that time, the vendor has kept on improving both the interface and the design patterns. There are three new things that you should be using in your data engineering projects.

First, scheduling jobs has always been a part of the product. However, with workflows one can control the pattern in which notebooks are executed.

Second, delta live tables allow the developer to specify the source, the transformation and the destination of data using Python. This complete workflow can be scheduled as either a batch or real time streaming job.

Third, cloning technology has come to delta tables. One can use shallow clones to keep different environments in synch with the master. On the other hand, deep clones allow the developer to stamp out a table at a given point of time.

To recap, come to this presentation to learn about these new design patterns and get a demonstration on how to apply them in your data engineering projects.

John Miner

Insight Digital Innovations

Providence, Rhode Island, United States

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top