Falek Miah

Principal Data Engineering Consultant at Advancing Analytics

London, United Kingdom

Actions

Microsoft, Databricks (Spark) and Terraform (HashiCorp) certified consultant with over 15+ years’ technical experience.

Specialising in Business Intelligence, Azure Cloud and Power Platform, with extensive experience on delivering end-to-end cloud solutions across a wide range of industries.

I am a data, cloud & DevOps enthusiast with a passionate for automation and like sharing knowledge with the data community.

I enjoy running, swimming, football and learning new technologies.

Badges

Area of Expertise

Finance & Banking
Information & Communications Technology
Media & Information
Transports & Logistics

Topics

Azure Databricks
Azure
Azure DevOps
Terraform
Azure Data Factory
Azure SQL Database
Spark
PySpark
Git
DevOps
Data Engineering
Microsoft Fabric
Power BI / Fabric
power bi
Fabric Data Engineering
Fabric Data Warehouse
MS Fabric
PowerShell

Simplifying Code Distribution with Databricks Asset Bundles

Sharing code effectively is key to building scalable and maintainable data solutions. Whether you're deploying Python libraries or moving workflows across environments, efficient code distribution ensures consistency, reduces errors, and streamlines collaboration.

In this session, we’ll explore two powerful ways to package and distribute code: Python Wheels and Databricks Asset Bundles (DABs). You’ll learn how Python Wheels enable faster, more reliable sharing of Python code and how Databricks Asset Bundles allow you to package entire projects, including scripts, workflows, and Delta Live Tables. We’ll also cover the key differences between these approaches and when to use each.

By the end, you’ll have a clear understanding of how to distribute code effectively in Databricks. You’ll gain practical knowledge of Python Wheels for efficient package distribution and Databricks Asset Bundles for managing full-scale projects, helping you simplify development and deployment.

Advanced Spark for Data Engineers

Are you ready to take your Spark skills to the next level? This one-day, hands-on workshop is designed for data engineers who want to master advanced Spark techniques and build scalable, high-performance solutions. Whether you’re working with Databricks, Fabric, or both, this training will empower you to move beyond the basics and tackle real-world challenges with confidence.

By the end of this workshop, you will be able to

- Write clean, modular Spark code with reusable functions and dynamic transformations.
- Apply advanced coding techniques to handle schema evolution, complex file formats, and secure workflows.
- Manage secrets effectively, integrating sensitive credentials seamlessly into Spark pipelines.
- Optimize and tune Spark jobs, understanding internal execution plans to maximize performance.
- Utilize advanced Databricks features, including Unity Catalog, Auto Loader, Repositories, and Databricks CLI & DBConnect, to streamline development and governance.

This workshop is designed for

- Data engineers with foundational Spark knowledge who want to expand their expertise.
- Professionals building ETL pipelines, enabling real-time analytics, or processing large-scale datasets.
- Engineers working in Databricks, Fabric, or similar platforms, seeking advanced skills to enhance scalability and performance.

Let’s elevate your Spark skills and transform the way you approach data engineering. Join us for this intensive workshop and unlock the full potential of Spark!

Terraforming Microsoft Fabric

Deploying Microsoft (MS) Fabric components can be time-consuming, tricky and inconsistent. While MS Fabric unifies tools like Power BI, Azure Synapse, Data Factory, and Data Lake into a unified SaaS platform, automating its deployment has been a major pain point for many.

Terraform is a widely popular and easy-to-learn infrastructure as code (IaC) tool. It simplifies resource deployment by automating the process, ensuring consistency, reliability, and reducing the risk of manual errors.

Microsoft introduced its own Terraform provider for MS Fabric, allowing for the automation of resource deployment and management, making the process faster, more reliable, and efficient.

In this session, you'll be introduced to the Terraform and the new MS Fabric provider and guided through the steps to build an automated solution for provisioning MS Fabric workspaces and workspace items.

By the end of this session, you'll have the necessary knowledge to uitilise Terraform for your MS Fabric deployments.

DevOps Magic for Successful Data Projects

Are your data projects sinking instead of sailing to success? Do great ideas get stuck in limbo, never making it to production? It’s time to change the game.

Break down silos, supercharge collaboration, and take your data projects to the next level with the power of DevOps! In today’s fast-paced world, data engineers, scientists, and operations teams need to work seamlessly—but inefficiencies and roadblocks often derail progress.

This session dives into the core challenges of data projects and shows how DevOps principles can break barriers and drive success. Discover how to spark collaboration with effective communication, foster shared ownership, and build feedback loops that keep processes evolving.

We’ll cover automating data pipelines with CI/CD, implementing unit testing, and using gating to ensure quality and reliability every step of the way. By the end, you’ll have the know-how to streamline teamwork, automate with confidence, and build feedback-driven systems that scale effortlessly.

This isn’t just about solving problems—it’s about creating sustainable, scalable practices for long-term success. Whether you’re new to Azure DevOps or ready to level up, this session will equip you with the tools to turn your data projects into smooth, successful launches.

Quest to Delta Optimisation

Delta has become a widely used tool by data professionals to build effective and reliable Lakehouse’s in Databricks and MS Fabric.

Yet, questions arise regarding its performance with large datasets, its ability to handle skewed data, and its concurrent write management. In this session, will dive deep into optimization options and methods that will improve your Lakehouse performance.

Delta files are not ordinary data files but are key in making Lakehouse efficient, optimal, and scalable. However, optimizing delta files and tables can be challenging and even a daunting task. Techniques like partitioning and z-ordering can be limited, inflexible, and challenging to implement, especially when your data is constantly changing or growing.

This session will introduce you to the latest optimization techniques to enhance your query performance and simplify your optimization process. We will cover liquid clustering, a cutting-edge approach that offers flexibility and adaptability to data layout changes, and v-order, a write-time optimization for the Parquet file format that enables lightning-fast reads.

Furthermore, we will explore various other Delta file optimization techniques, such as data skipping, z-ordering, and vacuuming. These techniques will help you maximize the value of your Delta files while minimizing resource utilization and costs.

By the end of this session, you'll have the necessary knowledge and tools to optimize Delta files and tables for your own Lakehouse.

Building a Lakehouse with Databricks Unity Catalog

The Data Lakehouse is an emerging architecture reshaping how we handle data. At the heart of this evolution lies Databricks Unity Catalog, transforming the way we manage data within the Lakehouse.

Delta is not just a file format, it's the engine driving the Lakehouse concept, solving data management challenges effortlessly. However, there remained gaps in data discovery and governance within these Lakehouse data platforms. We'll uncover how Unity Catalog bridges these gaps, eliminating the need for external tools.

In this session, we'll understand the purpose behind Unity Catalog, highlight its key features for controlling data assets and the core components to explore data effectively.

By the end of this session, you will have a clear understanding of Unity Catalog capabilities and how to leverage them to build a robust and efficient data lakehouse using Databricks Unity Catalog.

Value of DevOps Release Process in Data Teams

Have you ever wondered why release plans, approaches, and environments are important in the world of data operations? Many data professionals come from various backgrounds without prior software development experience, leading to questions about the necessity of these concepts.

In this session, we will discuss the significance of DevOps Release Processes for data teams. We will explore how insufficient processes can lead to delays in deployment, introduce breaking changes, hinder team collaboration and result in multiple releases.

This session will explore why DevOps, release processes, plans, and development environments (dev, test, and prod) are essential for growing data teams. Will examine different branching strategies like GitFlow and GitLab Flow, weighing the pros and cons of each.

By the end of the session, you'll not only understand the importance of these practices but also see how they are applied in data teams. You'll discover how they help streamline processes, improve collaboration, and reduce risks in data projects.

Mastering Delta Lakes in Azure

Once upon a time we had the Data Warehouse, life was good but it had its limitations, particularly around loading/storing complex data types. As data grew larger and more varied, the warehouse became too rigid and opinionated.

So we dove headfirst into Data Lakes to store our data. Again, things were good, but missed some of the good times that the Data Warehouse had given us. The lake had become too flexible, we needed stability in our life. In particular, we needed A.C.I.D (Atomicity, Consistency, Isolation, and Durability) Transactions.

Delta Lake, hosted by the Linux Foundation, is an open-source file layout protocol for giving us back those good times, whilst retaining all of the flexibility of the lake. Delta has gone from strength to strength, and in 2022 Databricks finally open-sourced the entire code-base, including lots of advanced features that were previously Databricks-only. This workshop takes you from the absolute basics of using Delta within a Lake, through to some of those advancing engineering features, letting you really master your Delta Lake.

In this workshop we will go from Zero to Hero with Delta, including:
• Handling Schema Drift
• Applying Constraints and Database Designs
• Time-Travel & Management
• Optimize & Performance Tuning
• Streaming

We will also show you how to work with Delta inside and outside of its original home of Databricks.

This training has been designed from our hands-on experience working with Delta and implementing Delta solutions to our clients across the globe. The course is aimed at beginners, and you will leave this course with all the skills you needed to get started on your Delta journey.

The course will be delivered by Microsoft MVP and Databricks Champion working together to bring you the best.

Spark Execution Plans for Databricks

Databricks is a powerful data analytics tool for data science and data engineering, but understanding how code is executed on cluster can be daunting.

Using Spark execution plans allows you to understand the execution process and flow, this is great for optimizing queries and identifying bottlenecks.

This session will introduce you to Spark execution plans, the execution flows and how to interrogate the different plans.

By the end of this session, you will have everything you need to start optimizing your queries.

Introduction into Spark Execution Plans for Databricks for optimizing code and execution.

Introduction to the wonders of Azure DevOps

Azure DevOps is the leading deployment tool for build and release solutions end to end. It helps you plan your Agile project, manages Git code, and deploys solutions using Continuous Integration (CI) and Continuous Deployment (CD) pipelines.

In this session we will cover some of the core components of Azure DevOps and show you how to implement a secure deployment pipeline, using unit tests and gating with your CI builds and CD releases.

By the end of this session, you will have everything you need to start using Azure DevOps and start building secure deployment pipelines.

Deploy Synapse using Terraform & DevOps

Provision infrastructure as code (IaC) is great approach to deploy resources in reliable and consistent way.

Terraform is a highly popular and easy to learn. IaC solution that simplifies the deployment process. Terraform can be used with all the major cloud providers: Azure, AWS & GCP.

Terraform with Azure DevOps can be used to automate the provisioning of Synapse in an effective and efficient way.

This session will be introducing you to Terraform, Azure DevOps and Synapse providers so you can provision Synapse workspace and components into Azure cloud platform using Terraform.

Terraform and Azure DevOps can be used to automate the provisioning of Synapse in an effective and efficient way.

Achieving DevOps Nirvana: Automating Azure Data Platform Deployments with Terraform

Adopting full Infrastructure as Code (IaC) can be a daunting task, not always accessible to every data developer, given the variety in experience and skill-set. It is important we work towards the DevOps dream of us all being part of the process , and all being responsible for and understanding our solutions infrastructure – but how do we achieve this dream?

Terraform is a highly popular and easy to learn. IaC solution that simplifies the deployment process. Terraform can be used with all the major cloud providers: Azure, AWS & GCP. Also, specialist analytics tools such as Databricks have introduced their own Terraform providers to assist with deploying and managing resources into all major cloud providers.

In this workshop you will be introduced to Terraform, and learn its core concepts and components. We will then focus on designing and deploying an Azure Data Platform solution, including a Resource Group, Key Vault, ADLS (Azure Data Lake Store), Synapse and Databricks.

Once we have our solution, we will run our Terraform via a DevOps CI/CD (Continuous Integration/Continuous Deployment) pipeline. Finally, we will cover some of the most common security and networking challenges. We then finish with best practice guidelines and comparisons with other popular IaC solutions

Join us and develop the core knowledge you need to work with Terraform for your Azure Data Platform solution(s), along with transferable Terraform skills that can be used with other Cloud Providers

Deploy Databricks components using Terraform

Databricks is a great data analytics tool for data science and data engineering, but provisioning Databricks resources (workspace, clusters, secrets, mount storage etc.) can be complex and time consuming.

Automating deployment of Databricks resources has been tricky in the past using Terraform an Infrastructure as Code tool. It has required using mix of Terraform Azure providers and/or ARM, PowerShell, Databricks CLI or REST APIs. This made it harder to repeat and caused inconsistent environments.

Databricks introduced its own Terraform provider to assist with deploying and managing Databricks resources into Azure, Google (GCP) and Amazon Web Services (AWS) cloud platforms. Giving the ability to automate deployment of Databricks resources at the time of provisioning the infrastructure, making it easier to manage and maintain.

This session will be introducing you to Terraform, Databricks provider and take you through the steps required to build an automated solution to provision Databricks workspace and resources into Azure cloud platform using Terraform.

By the end of this session, you will have everything you need to automate your Databricks environments deployments and ensure consistency.

Falek Miah

Principal Data Engineering Consultant at Advancing Analytics

London, United Kingdom

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Falek Miah

Actions

Links

Badges

Area of Expertise

Topics

Sessions

Simplifying Code Distribution with Databricks Asset Bundles

Advanced Spark for Data Engineers

Terraforming Microsoft Fabric

DevOps Magic for Successful Data Projects

Quest to Delta Optimisation

Building a Lakehouse with Databricks Unity Catalog

Value of DevOps Release Process in Data Teams

Mastering Delta Lakes in Azure

Spark Execution Plans for Databricks

Introduction to the wonders of Azure DevOps

Deploy Synapse using Terraform & DevOps

Achieving DevOps Nirvana: Automating Azure Data Platform Deployments with Terraform

Deploy Databricks components using Terraform

Falek Miah

Links

Actions