

Nilanjan Chatterjee
Sr. Staff Data Architect
Austin, Texas, United States
Actions
I am Nilanjan Chatterjee , a seasoned Data Engineering leader and architect at AMD, with extensive experience in rolling out 0-1 turn-key products and features across Fintech, Semiconductor and Telecom domains. With over 12 years of experience across Data Engineering , Data Science and Gen AI pipeline and MLOps, I assist business operationalise their data and ML strategy and find data driven solutions to problems.
Links
Area of Expertise
Topics
Who Needs a Warehouse When You've Got a Lakehouse?
The data scene has seen a big change in recent years, moving from old-fashioned data warehouses to more easy and strong lakehouse architectures. This switch means not just a technical advancement but a basic rethinking of how groups keep, handle, and get value from their data assets.
The traditional data warehouse worked well with structured data because there was a defined schema, consistency, and strong BI performance. The high costs and inflexibility of such warehouses meant that they were not suitable for handling unstructured data- much less a large volume of such unstructured data.
This is what gave birth to the data lake: an inexpensive place to store enormous volumes of highly variegated datasets-with schema-on-read flexibility. It then often turned into the "data swamp" due to quality issues, analytical performance shortcomings, and governance issues.
Then came the lakehouse architecture, a brilliant amalgamation that took unto it the strengths of both warehouse and lake.
Quicker Analytics : Self Serve Analytics to the rescue
As an Architect evaluating our analytics transformation roadmap, I've identified self-serve analytics as the critical accelerator for our enterprise data strategy. Our current centralized BI bottleneck creates an unsustainable multiple sprints average insight delivery timeline – completely incompatible with modern business velocity.
The proposed architecture implements a three-tier semantic modeling approach:
-- Core Data Layer: Leveraging our lakehouse medallion architecture with materialized views on Gold datasets, structured through domain-driven design principles
-- Semantic Modeling Tier: Implementing metric stores with SQL-based abstraction layers to decouple business logic from physical infrastructure
-- Visualization/Exploration Layer: Deploying governed tools supporting both SQL-fluent analysts and business users requiring GUI interfaces as playground for the data
Performance benchmarks from our POC demonstrate 95% reduction in time-to-insight, with 78% of previously centralized report requests now self-serviced using tools like Sigma, ThoughtSpot. Data mesh principles have been incorporated for domain-oriented ownership, while ensuring central governance through automated quality controls.
Data Observability and Reliability Engineering in a Real Time world
Data observability and reliability engineering are rapidly emerging as foundational pillars in modern data engineering and MLOps, and ensuring that data pipelines are robust, trustworthy, and capable of supporting critical business operations is imperative.
Data observability is the comprehensive ability to monitor, track, and analyze data as it moves through pipelines, providing real-time insights into data health, quality, and system performance
. It goes beyond traditional monitoring by offering a holistic, proactive approach to identifying and resolving issues before they impact downstream analytics or machine learning models
Data Reliability Engineering focuses on ensuring that data is consistently accurate, available, and dependable over time
. It leverages observability tools and practices to maintain high standards of data quality and system uptime, often borrowing principles from Site Reliability Engineering (SRE) such as Service Level Objectives (SLOs) and error budgets.
Walkthrough of Medallion : Why Organizations Need the Medallion Architecture
The medallion architecture is a data organization framework that has become crucial for organizations implementing data lakehouses. Here's why it's so valuable:
Structured Data Quality Management
The medallion approach (typically using Bronze, Silver, Gold layers) provides a systematic method to progressively improve data quality. Organizations can maintain raw data while ensuring downstream analytics use only validated, transformed data.
Clear Data Lineage
By organizing data through distinct processing stages, organizations gain transparent data lineage. This makes it significantly easier to trace how data flows through the system, troubleshoot issues, and satisfy regulatory compliance requirements.
Optimized Performance
The medallion architecture enables performance optimization at each layer. Organizations can structure their Gold layer for query performance, while maintaining Bronze layers for completeness and Silver for transformation logic.
Simplified Access Management
Different user groups require different data access levels. The medallion approach allows organizations to implement granular security policies—data scientists might access Silver data, while business analysts only work with curated Gold datasets.
Workload Isolation
Organizations can isolate intensive data processing jobs by layer, preventing resource contention. ETL processes on Bronze data won't impact analysts querying Gold datasets.
Accelerated Time-to-Insight
By providing pre-processed, validated data in the Gold layer, organizations dramatically reduce the time analysts spend preparing data, allowing them to focus on extracting insights instead.
Future-Proof Architecture
As data requirements evolve, organizations can adapt each layer independently without disrupting the entire pipeline, providing architectural flexibility for changing business needs.
The medallion architecture isn't just a technical implementation detail—it's a strategic approach that helps organizations balance data governance, performance, and accessibility in their lakehouse environments.
YouTube session and local Data Chapter
SQL Indexes - Boon or Bane?
SQL indexes are a powerful tool for optimizing database performance, but their effectiveness depends on platform-specific strengths and trade-offs. In Azure SQL, automated features like index tuning and columnstore indexes streamline analytical workloads, while managed maintenance reduces fragmentation risks. For PostgreSQL, flexibility shines with specialized indexes (e.g., GIN for JSONB, BRIN for time-series) and partial/expression-based indexing, enabling tailored optimizations. Both platforms enforce data integrity via unique indexes, and read-heavy systems benefit significantly. However, Azure’s automation can lead to unintended index drops, and columnstore indexes require partitioning discipline. PostgreSQL demands manual upkeep (e.g., VACUUM for bloat) and risks suboptimal plans without proper composite index design.
The downsides center on write overhead and cost. Azure SQL’s indexing increases DTU consumption and storage costs, especially in geo-replicated setups. PostgreSQL’s MVCC model causes index bloat, impacting distributed systems like Citus. Over-indexing in either system inflates storage: Azure’s tiered pricing penalizes excess, while PostgreSQL’s self-managed flexibility still demands cost-awareness. Ultimately, indexes are a boon when aligned with platform capabilities (e.g., Azure’s analytics focus, PostgreSQL’s data-type diversity) but a bane if applied generically without workload analysis and maintenance planning.
Intermediate (300) level : New Session
Data Summit 2025
https://www.dbta.com/DataSummit/2025/Nilanjan-Chatterjee.aspx
Session : Data Observability and Reliability Engineering in a Real-Time World

Nilanjan Chatterjee
Sr. Staff Data Architect
Austin, Texas, United States
Links
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top