Most Active Speaker

Falek Miah

Falek Miah

Principal Consultant at Advancing Analytics

London, United Kingdom

Actions

Microsoft, Databricks (Spark) and Terraform (HashiCorp) certified consultant with over 15+ years’ technical experience.

Specialising in Business Intelligence, Azure Cloud and Power Platform, with extensive experience on delivering end-to-end cloud solutions across a wide range of industries.

I am a data, cloud & DevOps enthusiast with a passionate for automation and like sharing knowledge with the data community.

I enjoy running, swimming, football and learning new technologies.

Awards

  • Most Active Speaker 2023

Area of Expertise

  • Information & Communications Technology

Quest to Delta Optimisation

Delta has become a widely used tool by data professionals to build effective and reliable Lakehouse’s. Yet, questions arise regarding its performance with large datasets, its ability to handle skewed data, and its concurrent write management. In this session, will dive deep into optimization options and methods that will improve your Lakehouse performance.

Delta files are not ordinary data files but are key in making Lakehouse efficient, optimal, and scalable. However, optimizing delta files and tables in Databricks can be challenging and even a daunting task. Techniques like partitioning and z-ordering can be limited, inflexible, and challenging to implement, especially when your data is constantly changing or growing.
This session will introduce you to the new liquid clustering technique, a cutting-edge approach that is more flexible and adaptable to data layout changes. This will not only enhance your query performance but also simplify your optimization process.
Furthermore, we will explore various other Delta file optimization techniques, such as data skipping, z-ordering, and vacuuming in Databricks. These techniques will help you maximize the value of your Delta files while minimizing resource utilization and costs.
By the end of this session, you'll have the necessary knowledge and tools to optimize Delta files and tables for your own Lakehouse.

Building a Lakehouse with Databricks Unity Catalog

The Data Lakehouse is an emerging architecture reshaping how we handle data. At the heart of this evolution lies Databricks Unity Catalog, transforming the way we manage data within the Lakehouse.

Delta is not just a file format, it's the engine driving the Lakehouse concept, solving data management challenges effortlessly. However, there remained gaps in data discovery and governance within these Lakehouse data platforms. We'll uncover how Unity Catalog bridges these gaps, eliminating the need for external tools.

In this session, we'll understand the purpose behind Unity Catalog, highlight its key features for controlling data assets and the core components to explore data effectively.

By the end of this session, you will have a clear understanding of Unity Catalog capabilities and how to leverage them to build a robust and efficient data lakehouse using Databricks Unity Catalog.

Value of DevOps Release Process in Data Teams

Have you ever wondered why release plans, approaches, and environments are important in the world of data operations? Many data professionals come from various backgrounds without prior software development experience, leading to questions about the necessity of these concepts.

In this session, we will discuss the significance of DevOps Release Processes for data teams. We will explore how insufficient processes can lead to delays in deployment, introduce breaking changes, hinder team collaboration and result in multiple releases.

This session will explore why DevOps, release processes, plans, and development environments (dev, test, and prod) are essential for growing data teams. Will examine different branching strategies like GitFlow and GitLab Flow, weighing the pros and cons of each.

By the end of the session, you'll not only understand the importance of these practices but also see how they are applied in data teams. You'll discover how they help streamline processes, improve collaboration, and reduce risks in data projects.

Mastering Delta Lakes in Azure

Once upon a time we had the Data Warehouse, life was good but it had its limitations, particularly around loading/storing complex data types. As data grew larger and more varied, the warehouse became too rigid and opinionated.

So we dove headfirst into Data Lakes to store our data. Again, things were good, but missed some of the good times that the Data Warehouse had given us. The lake had become too flexible, we needed stability in our life. In particular, we needed A.C.I.D (Atomicity, Consistency, Isolation, and Durability) Transactions.

Delta Lake, hosted by the Linux Foundation, is an open-source file layout protocol for giving us back those good times, whilst retaining all of the flexibility of the lake. Delta has gone from strength to strength, and in 2022 Databricks finally open-sourced the entire code-base, including lots of advanced features that were previously Databricks-only. This workshop takes you from the absolute basics of using Delta within a Lake, through to some of those advancing engineering features, letting you really master your Delta Lake.

In this workshop we will go from Zero to Hero with Delta, including:
• Handling Schema Drift
• Applying Constraints and Database Designs
• Time-Travel & Management
• Optimize & Performance Tuning
• Streaming

We will also show you how to work with Delta inside and outside of its original home of Databricks.

This training has been designed from our hands-on experience working with Delta and implementing Delta solutions to our clients across the globe. The course is aimed at beginners, and you will leave this course with all the skills you needed to get started on your Delta journey.

The course will be delivered by Microsoft MVP and Databricks Champion working together to bring you the best.

Spark Execution Plans for Databricks

Databricks is a powerful data analytics tool for data science and data engineering, but understanding how code is executed on cluster can be daunting.

Using Spark execution plans allows you to understand the execution process and flow, this is great for optimizing queries and identifying bottlenecks.

This session will introduce you to Spark execution plans, the execution flows and how to interrogate the different plans.

By the end of this session, you will have everything you need to start optimizing your queries.

Introduction into Spark Execution Plans for Databricks for optimizing code and execution.

Introduction to the wonders of Azure DevOps

Azure DevOps is the leading deployment tool for build and release solutions end to end. It helps you plan your Agile project, manages Git code, and deploys solutions using Continuous Integration (CI) and Continuous Deployment (CD) pipelines.

In this session we will cover some of the core components of Azure DevOps and show you how to implement a secure deployment pipeline, using unit tests and gating with your CI builds and CD releases.

By the end of this session, you will have everything you need to start using Azure DevOps and start building secure deployment pipelines.

Deploy Synapse using Terraform & DevOps

Provision infrastructure as code (IaC) is great approach to deploy resources in reliable and consistent way.

Terraform is a highly popular and easy to learn. IaC solution that simplifies the deployment process. Terraform can be used with all the major cloud providers: Azure, AWS & GCP.

Terraform with Azure DevOps can be used to automate the provisioning of Synapse in an effective and efficient way.

This session will be introducing you to Terraform, Azure DevOps and Synapse providers so you can provision Synapse workspace and components into Azure cloud platform using Terraform.

Terraform and Azure DevOps can be used to automate the provisioning of Synapse in an effective and efficient way.

Achieving DevOps Nirvana: Automating Azure Data Platform Deployments with Terraform

Adopting full Infrastructure as Code (IaC) can be a daunting task, not always accessible to every data developer, given the variety in experience and skill-set. It is important we work towards the DevOps dream of us all being part of the process , and all being responsible for and understanding our solutions infrastructure – but how do we achieve this dream?

Terraform is a highly popular and easy to learn. IaC solution that simplifies the deployment process. Terraform can be used with all the major cloud providers: Azure, AWS & GCP. Also, specialist analytics tools such as Databricks have introduced their own Terraform providers to assist with deploying and managing resources into all major cloud providers.

In this workshop you will be introduced to Terraform, and learn its core concepts and components. We will then focus on designing and deploying an Azure Data Platform solution, including a Resource Group, Key Vault, ADLS (Azure Data Lake Store), Synapse and Databricks.

Once we have our solution, we will run our Terraform via a DevOps CI/CD (Continuous Integration/Continuous Deployment) pipeline. Finally, we will cover some of the most common security and networking challenges. We then finish with best practice guidelines and comparisons with other popular IaC solutions

Join us and develop the core knowledge you need to work with Terraform for your Azure Data Platform solution(s), along with transferable Terraform skills that can be used with other Cloud Providers

Deploy Databricks components using Terraform

Databricks is a great data analytics tool for data science and data engineering, but provisioning Databricks resources (workspace, clusters, secrets, mount storage etc.) can be complex and time consuming.

Automating deployment of Databricks resources has been tricky in the past using Terraform an Infrastructure as Code tool. It has required using mix of Terraform Azure providers and/or ARM, PowerShell, Databricks CLI or REST APIs. This made it harder to repeat and caused inconsistent environments.

Databricks introduced its own Terraform provider to assist with deploying and managing Databricks resources into Azure, Google (GCP) and Amazon Web Services (AWS) cloud platforms. Giving the ability to automate deployment of Databricks resources at the time of provisioning the infrastructure, making it easier to manage and maintain.

This session will be introducing you to Terraform, Databricks provider and take you through the steps required to build an automated solution to provision Databricks workspace and resources into Azure cloud platform using Terraform.

By the end of this session, you will have everything you need to automate your Databricks environments deployments and ensure consistency.

Falek Miah

Principal Consultant at Advancing Analytics

London, United Kingdom

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top