Speaker

Simona Meriam

Simona Meriam

Senior Data Software Engineer

Tel Aviv, Israel

Simona Meriam is a Senior Data Software Engineer at Qwak, where she specializes in research and development of the first ML platforms feature store. In her previous positions as a Big Data Engineer at Nielsen and Aidoc, she researched and developed big data solutions using cutting-edge technologies such as Spark, Kafka, and Elasticsearch.

Area of Expertise

  • Media & Information
  • Information & Communications Technology

Topics

  • Apache Spark
  • Apache Airflow
  • Apache Kafka
  • Scala Programming
  • Big Data
  • PostgreSQL
  • SQL
  • AWS Lamda
  • Music
  • Japan
  • NoSQL
  • query optimization
  • Beats
  • ElasticSearch
  • Elastic Stack
  • All things data
  • data engineering
  • AWS Data
  • AWS Lambda
  • AWS S3
  • Analytics and Big Data
  • Spark
  • Kafka
  • Traveling
  • MedTech
  • Media & Information
  • Delta Lake
  • Data lake architecture
  • Feature Stores
  • Qwak
  • python

Logging Apache Spark - How we made it easy

Looking at our metrics on Graphite is pretty nice, but what about our logs? How do you improve the visibility of your logs while running Spark on EMR? If you're tired of ssh-ing into your servers and searching log files, this architecture design is exactly for you.

Auditing your data and answering the life long question, is it the end of the day yet?

Over here at Nielsen, data is very important to us. Being the core of our business, we love it and there’s lots of it. We don’t want to lose it, and at the same time, we don’t want to duplicate it.
Our data goes through a robust Kafka architecture, into several ETLs, receiving, transforming and storing the data.
While we clearly understood our ETLs’ workflow, we had no visibility into what parts of the data, if any, were lost or duplicated, and in which stage or stages of the workflow, from source to destination.

But how much do we know about the way our data makes though our systems? And what about the life long question, is it the end of the day yet?

In this talk I’m going to present to you the design process behind our Data Auditing system, Life Line. From tracking and producing , to analysing and storing auditing information, using technologies such as Kafka, Avro, Spark, Lambda functions and complex SQL queries. We’re going to cover:
* AVRO Audit header
* Auditing heart beat - designing your metadata
* Designing and optimising your auditing table - what does this data look like anyway?
* Creating an alert based monitoring system
* Answering the most important question of all - is it the end of the day yet?

Kafka Summit London 2022 Sessionize Event

April 2022 London, United Kingdom

NDC Porto 2022 Sessionize Event

April 2022 Porto, Portugal

Subsurface LIVE Winter 2022 Sessionize Event

March 2022

NDC Oslo 2021 Sessionize Event

November 2021 Oslo, Norway

Build Stuff 2021 Lithuania Sessionize Event

November 2021 Vilnius, Lithuania

Open Source Experience Sessionize Event

November 2021 Paris, France

Big Mountain Data and Dev Conference Sessionize Event

October 2021

DataEngBytes 2021 Sessionize Event

October 2021

Music City Tech 2021 Sessionize Event

September 2021

Data Geeks Saturday Conference Sessionize Event

August 2021

Simona Meriam

Senior Data Software Engineer

Tel Aviv, Israel

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top