Anomaly detection at Apple for large scale data using Apache Spark and Flink

Anomaly detection in time series data is crucial for identifying unusual patterns and trends, enabling better alerting and action when data deviates from normal. Most anomaly detection algorithms perform adequately on a single node machine with public datasets, but do not scale well with distributed processing frameworks used in modern big data environments. This talk will focus on how we scaled anomaly detection for large-scale datasets using Apache Spark and Flink for both batch and near real time use cases. We will also discuss how we leveraged Apache Spark to parallelize and scale common anomaly detection algorithms, enabling support for large-scale data processing. We will highlight some of the challenges faced and how we resolved them to make it useful for massive datasets with varying degree of anomalies. Finally, we will demonstrate how our anomaly detection framework works in batch for petabytes of data and in streaming mode for 100s of thousands of transactions per second.

Himadri Pal

Principal Software Engineer - Data and AI at Apple

Cupertino, California, United States

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Anomaly detection at Apple for large scale data using Apache Spark and Flink

Himadri Pal

Links

Actions