Session

Optimizing Metrics and Alerts Management with Thanos

Imagine your organization has a variety of IT components such as multi-cloud environments, storage systems, hypervisors, container orchestrators, and application metrics. Each component may also come from different open sources or third-party vendors. As a DevOps team, you need to manage thousands of exporter endpoints, Prometheus nodes, and millions of alert rules that need to be maintained, updated, and kept functioning correctly.

In our company, we faced the same problems until we found Thanos. Thanos helped us centralize the metrics of various independent components into a single cluster. We also use ArgoCD for GitOps and change management of the entire alert rule and metric endpoint configuration.

This talk will share a case study of how we solved this problem. It will provide an overview of the architecture and scalability achieved by combining Thanos and GitOps, explain how we applied these solutions across all our subsidiary teams, and discuss some trade-offs we encountered.

Sang Tran Quoc

Deputy Director of Cloud Infrastructure Service Development Center - FPT Smart Cloud

Ho Chi Minh City, Vietnam

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top