Session
Why Your Fabric Lakehouse Gets Slower Over Time (and How to Fix It)
This full-day advanced training focuses on keeping Microsoft Fabric Lakehouses performant, cost-efficient, and predictable over time. The session is aimed at engineers and architects who already use Fabric and want to understand what actually happens under the hood, why performance degrades, and how to fix it systematically at scale.
The day starts with a detailed, practical explanation of how the Delta format works in Fabric. This includes how data is written, how the transaction log evolves, how small files and fragmentation are created, and why these mechanics directly impact query latency and capacity consumption. Rather than abstract theory, the focus is on understanding the concrete consequences of common ingestion and update patterns.
Building on this foundation, the training explains why OPTIMIZE and VACUUM are required, what problems they solve, and how to use them safely. Attendees learn what OPTIMIZE actually does to files, what VACUUM removes, how retention works, and how poor maintenance strategies can easily increase cost instead of reducing it.
The scope then expands from a single lakehouse to an enterprise perspective. Participants learn how to design and implement a centralized maintenance approach that works across multiple workspaces and lakehouses, avoiding ad-hoc notebooks and inconsistent practices. The session covers parameterized maintenance jobs, scheduling strategies that do not interfere with interactive workloads, and governance patterns to ensure maintenance is applied consistently across the organization.
A significant portion of the day is dedicated to performance diagnostics using lakehouse system tables. Attendees learn how to extract performance and workload signals, identify heavy queries, and map internal identifiers back to meaningful objects such as semantic model names and report names. This makes it possible to understand which reports and models are driving load and to turn performance troubleshooting into a repeatable process.
The training also covers critical lakehouse configuration choices that materially affect performance and cost. In particular, it explains V-Order in detail, including what it changes, why it is no longer a default decision, and how to plan where it should and should not be applied. Attendees learn how to validate V-Order impact and avoid blanket configurations that lead to unnecessary compute usage.
The day concludes with table design and query execution practices that complement maintenance activities. This includes partitioning strategies, Z-Order usage for specific access patterns, and practical PySpark query-writing techniques that avoid common performance pitfalls in Fabric. These topics tie together storage layout, maintenance, and query behavior into a single, coherent performance strategy.
📚 Topics covered (table of contents)
🧱 Delta format internals in Fabric
➡ Delta transaction log structure and lifecycle
➡ What different operations write to storage
➡ Small files, fragmentation, and performance decay
➡ Relationship between storage layout and query cost
🧹 OPTIMIZE and VACUUM: purpose and correct usage
➡ What OPTIMIZE actually changes at file level
➡ What VACUUM removes and how retention works
➡ Safe execution patterns and scheduling
➡ Common maintenance anti-patterns that increase cost
🏢 Enterprise-scale lakehouse maintenance
➡ Designing a centralized maintenance strategy
➡ Applying maintenance across multiple lakehouses and workspaces
➡ Parameterization and reuse of maintenance logic
➡ Scheduling without impacting interactive workloads
➡ Governance, standards, and observability
📈 Performance diagnostics with lakehouse system tables
➡ Extracting workload and query metrics
➡ Identifying heavy and recurring queries
➡ Mapping internal IDs to semantic models and reports
➡ Building actionable views for governance and capacity planning
⚙️ Critical configuration decisions (V-Order and beyond)
➡ What V-Order changes and why it requires planning
➡ Deciding where V-Order makes sense and where it does not
➡ Measuring and validating performance impact
➡ Avoiding configuration-by-default mistakes
🗂 Table design and query performance practices
➡ Partitioning strategies aligned with access patterns
➡ Z-Order usage and limitations
➡ PySpark query patterns that improve performance
➡ Common syntax and design mistakes that trigger expensive plans
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top