Most Active Speaker

James Serra

James Serra

Big Data/Data Warehouse Evangelist at Microsoft

Ponte Vedra Beach, Florida, United States

James works at Microsoft as a big data and data warehousing solution architect where he has been for most of the last nine years. He is a thought leader in the use and application of Big Data and advanced analytics, including data architectures such as the modern data warehouse, data lakehouse, data fabric, and data mesh. Previously he was an independent consultant working as a Data Warehouse/Business Intelligence architect and developer. He is a prior SQL Server MVP with over 35 years of IT experience. He is a popular blogger (JamesSerra.com) and speaker, having presented at dozens of major events including SQLBits, PASS Summit, Data Summit and the Enterprise Data World conference. He is the author of the book “Deciphering Data Architectures: Choosing Between a Modern Data Warehouse, Data Fabric, Data Lakehouse, and Data Mesh”.

Awards

  • Most Active Speaker 2023

Area of Expertise

  • Information & Communications Technology

Topics

  • Data Warehousing
  • Modern Data Warehouse
  • Big Data
  • Analytics and Big Data
  • Data Mesh
  • Data Fabric
  • Data Lakehouse

Azure Synapse Database Templates

I'll discuss this new feature in Azure Synapse Analytics that allows you to quickly and easily use industry data models to shortcut the process of ingesting data from multiple sources into a common data model. I'll not only discuss and demo the technology, but also the roles and responsibilities that companies need to have to effectively model their data.

Azure Synapse Analytics Overview: A Data Lakehouse

Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. In this presentation, I'll talk about the new products and features that make up Azure Synapse Analytics and how it fits in a modern data warehouse, as well as provide demonstrations.

Relational databases vs Non-relational databases

There is a lot of confusion about the place and purpose of the many recent non-relational database solutions (“NoSQL databases”) compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, how they compare to Hadoop, and discuss the best use cases for each. I’ll discuss topics involving ACID vs BASE, scaling, data warehousing, polyglot persistence, CAP theorem, and SQL Server 2016 and PolyBase. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.

Power BI for Big Data and the New Look of Big Data Solutions

New features in Power BI give it enterprise tools, but that does not mean it automatically creates an enterprise solution. In this talk we will cover these new features (composite models, aggregations tables, dataflow) as well as Azure Data Lake Store Gen2, and describe the use cases and products of an individual, departmental, and enterprise big data solution. We will also talk about why a data warehouse and cubes still should be part of an enterprise solution, and how a data lake should be organized.

Differentiate Big Data vs Data Warehouse use cases for a cloud solution

It can be quite challenging keeping up with the frequent updates to the Microsoft products and understanding all their use cases and how all the products fit together. In this session we will differentiate the use cases for each of the Microsoft services, explaining and demonstrating what is good and what isn’t, in order for you to position, design and deliver the proper adoption use cases for each with your customers. We will cover a wide range of products such as Databricks, SQL Data Warehouse, HDInsight, Azure Data Lake Analytics, Azure Data Lake Store, Blob storage, and AAS as well as high-level concepts such as when to use a data lake. We will also review the most common reference architectures (“patterns”) witnessed in customer adoption.

Choosing technologies for a big data solution in the cloud

Has your company been building data warehouses for years using SQL Server? And are you now tasked with creating or moving your data warehouse to the cloud and modernizing it to support “Big Data”? What technologies and tools should use? That is what this presentation will help you answer. First we will level-set what big data is and other definitions, cover questions to ask to help decide which technologies to use, go over the new technologies to choose from, and then compare the pros and cons of the technologies. Finally we will show you common big data architecture solutions and help you to answer questions such as: Where do I store the data? Should I use a data lake? Do I still need a cube? What about Hadoop/NoSQL? Do I need the power of MPP? Should I build a “logical data warehouse”? What is this lambda architecture? And we’ll close with showing some architectures of real-world customer big data solutions. Come to this session to get started down the path to making the proper technology choices in moving to the cloud.

Building an Effective Data Warehouse Architecture

You’re a DBA, and your boss asks you to determine if a data warehouse would help the company. So many questions pop into your head: Why use a data warehouse? What’s the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What’s the difference between the Kimball and Inmon methodologies? What’s the difference between a data warehouse and a data mart? Is there any hardware I can purchase that is optimized for a data warehouse? What if I have a ton of data? Join this session for the answers to all these questions. You’ll leave with information that will amaze your boss and lead to a big raise… – or at least lead you down the correct path to adding business value to your organization!

Building a modern data warehouse

Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.

Is the traditional data warehouse dead?

With new technologies such as Hive LLAP or Spark SQL, do I still need a data warehouse or can I just put everything in a data lake and report off of that? No! In the presentation I’ll discuss why you still need a relational data warehouse and how to use a data lake and a RDBMS data warehouse to get the best of both worlds. I will go into detail on the characteristics of a data lake and its benefits and why you still need data governance tasks in a data lake. I’ll also discuss using Hadoop as the data lake, data virtualization, and the need for OLAP in a big data solution. And I’ll put it all together by showing common big data architectures.

Introduction to Azure Databricks

Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.

James Serra

Big Data/Data Warehouse Evangelist at Microsoft

Ponte Vedra Beach, Florida, United States