Session

Supporting Tombstone Messages with Azure Data Lake Sink Connector

Saxo Bank has built a new Kafka-based operational plane using data mesh concepts. In order to enable analytics at scale, modelled domain data is made available to our data lake with the help of the Confluent ADLS Connector. However, there is one major shortcoming of the ADLS connector which can lead to data divergence between the operational and analytical plane; namely the lack of a tombstone handler. In this talk I will describe the problem; why this doesn’t apply to more traditional connectors; the custom tombstone handling SMT that we developed to overcome this shortcoming, and finally how downstream analytic systems benefit. The talk would describe the following:


1. How we have been able to preserve and stream tombstone messages successfully to azure data lake & avoid losing any information coming from the source.

2. In addition to preserving the message, Enrich the tombstone message to include information about the message key, timestamp, offset and partition before writing the data back to azure data lake sink in either Avro or Parquet file.

3. Enrich the non-tombstone message value to include information like a key, timestamp, offset and partition before writing it back to azure data lake sink in either Avro or Parquet format.

4. Ability to log this important information in log types of the topic as well.

This has helped us with both compact and log type topics as this has helped us to determine if any keys were deleted from the source which had made it easier for BA's & Data Analysts to query the correct state of data in snowflake...

Rahul Gulati

Principal Data Platform Engineer

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top