

Shuangkun Tian
Alibaba Cloud Technical Expert
Actions
ShuangKun Tian is a software engineer at Alibaba Cloud, specializing in Scheduling, Elasticity, Workflow Orchestration and Performance tuning. He is the maintainer of the CNCF graduated project Argo Workflows.
Why Migrate Large-Scale Quantitative Backtesting from Slurm to Argo Workflows?
Slurm and Argo Workflows are among the most popular tools for scheduling and orchestrating parallel tasks in the HPC and cloud-native fields, respectively. However, Slurm has some limitations when facing scenarios that require high concurrency and scalability.
In this talk, I will share some challenges encountered when running large-scale quantitative backtesting tasks (processing more than 4,000 stocks in parallel) on Slurm, such as performance, cost, and flexibility. Additionally, I will explain why Argo Workflows was chosen as an alternative and introduce how to migrate to Argo Workflows to solve these challenges, while ensuring performance and improving the flexibility and scalability of the system. Also, I will share some best practices to run tens of thousands pod in parallel in Argo workflows, e.g. configuration of workflow controller and kubernetes control plane.
Revolutionizing Scientific Simulations with Argo Workflows
DP Technology provides scientific simulation platforms for research in biomedicine, energy, materials and other industries. Science simulation workflows are inherently complex and resource-intensive, and manual deployment is often prone to errors. After adopting Argo workflows to orchestrate science simulation, we get productivity 100% improvement. In this talk, we will introduce why chose Argo Workflow, how to orchestrate large-scale tasks of science simulation, how to make whole system scalability and reliability. Specially, we will share best practice about how manage super large workflow (thousands of tasks), how to do reasonable workflow retry, how to use memorization to reduce runtime and compute cost, how to interact with HPC systems. We also made contributions to Argo community to enhance functionalities and improve reliability. Additionally, we'll introduce DFlow, our open-source Python SDK designed for the seamless orchestration of scientific simulations with Argo Workflows.
KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 Sessionize Event

Shuangkun Tian
Alibaba Cloud Technical Expert
Actions
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top