![Rahul Gulati](https://sessionize.com/image/9898-400o400o2-QdTnHtQxGyGsy8kNnJvYms.jpg)
Rahul Gulati
Principal Data Platform Engineer
Actions
Working as Principal Data Platform Engineer, building a self service Data Mesh with Kafka and Confluent technologies. Expert on Kafka Connect, Schema Registry, buf schema registry, Kafka security(LDAP, SSL, Kerberos, OAuth etc), Azure Cloud, DataLakes, Snowflake, Kubernetes, Helm, Flux CD.
Data Mesh Champion and working on extending/building Data Mesh to Data Lake hosted in Azure.
Supporting Tombstone Messages with Azure Data Lake Sink Connector
Saxo Bank has built a new Kafka-based operational plane using data mesh concepts. In order to enable analytics at scale, modelled domain data is made available to our data lake with the help of the Confluent ADLS Connector. However, there is one major shortcoming of the ADLS connector which can lead to data divergence between the operational and analytical plane; namely the lack of a tombstone handler. In this talk I will describe the problem; why this doesn’t apply to more traditional connectors; the custom tombstone handling SMT that we developed to overcome this shortcoming, and finally how downstream analytic systems benefit. The talk would describe the following:
1. How we have been able to preserve and stream tombstone messages successfully to azure data lake & avoid losing any information coming from the source.
2. In addition to preserving the message, Enrich the tombstone message to include information about the message key, timestamp, offset and partition before writing the data back to azure data lake sink in either Avro or Parquet file.
3. Enrich the non-tombstone message value to include information like a key, timestamp, offset and partition before writing it back to azure data lake sink in either Avro or Parquet format.
4. Ability to log this important information in log types of the topic as well.
This has helped us with both compact and log type topics as this has helped us to determine if any keys were deleted from the source which had made it easier for BA's & Data Analysts to query the correct state of data in snowflake...
How we eased our security journey with OAuth in Production and retired Kerberos
Saxo Bank is on a growth journey and Kafka is a critical component to that success. Securing our financial event streams is a top priority for us and initially we started with an on-prem Kafka cluster secured with (the de-facto) Kerberos. However, as we modernize and scale, the demands of hybrid cloud, multiple domains, polyglot computing and Data Mesh require us to also modernize our approach to security. In this talk, we will describe how we took the default (non-production ready) Kafka OAuth implementation and productionized it to work with Kafka in Azure Cloud, including the Kafka stack and clients. By enabling both Kerberos and OAuth running on-prem and in the cloud, we now plan to gracefully retire Kerberos from our estate.
Talking points: 1-8
1. Writing production ready OAuth authentication backed by Azure Active Directory.
2. Running Kafka with OAuth security authentication mechanism with SSL & ACL’s.
3. Where to find the code (GitHub repo location).
.....
Designing a Data Mesh with Kafka
Designing a Data Mesh with Kafka
“Data Mesh objective is to create a foundation for getting value from analytical data and historical facts at scale” [Dehghani, Data Mesh founder]
If the central concern of a Data Mesh is about enabling analytics, then how is Kafka relevant?
In this talk we will describe how we managed to apply Data Mesh founding principles to our operational plane, based on Kafka. Consequently, we have gained value from these principles more broadly than just analytics. An example of this is treating data as-a-product, i.e. that data is discoverable, addressable, trustworthy and self-describing.
We will then describe our implementation, which includes deep dives into Cluster Linking, Connectors and SMTs. Finally, we discuss the dramatic simplification of our analytical plane and consuming personas.
Agenda
• Saxo Bank’s implementation of the Data Mesh
• Cluster Linking - Why? How?
• Data lake connectors – configuration and auto-deployment strategy
• Mapping Kafka to the data lake infrastructure.
• Enabling analytics, data warehousing and production support
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top