Philippe Gagnon
Senior Solutions Architect at Astronomer, Inc.
Architecte de solutions sénior, Astronomer, Inc.
Montréal, Canada
Actions
Philippe is an architect with the solutions engineering team at Astronomer, where he helps enterprises adopt Apache Airflow for their various data processing needs.
Philippe est un architecte dans l'équipe d'ingénierie de solutions chez Astronomer, avec qui il aide des entreprises à adopter Airflow pour leurs besoins en traitement de données.
Area of Expertise
Topics
Investigating the Many Loops of the Airflow Scheduler
The scheduler is unarguably the most important component of an Airflow cluster. It is also the most complex and misunderstood by practitioners and administrators alike.
In this talk, we will follow the path that a task instance takes to progress from creation to execution, and discuss the various configuration settings allowing users to tune the scheduler and executor to suit their workload patterns. Finally, we will dive deep into critical sections of the Airflow codebase and explore opportunities for optimization.
An Introduction to Airflow Cluster Policies
Cluster Policies are an advanced Airflow feature composed of a set of hooks that allow cluster administrators to implement checks and mutations against certain core Airflow constructs (DAGs, Tasks, Task Instances, Pods).
In this talk, we will discuss how cluster administrators can leverage these functions in order to better govern the workloads that are running in their environments.
Using Trino with Apache Airflow for (almost) all your data problems
Trino is incredibly effective at enabling users to extract insights quickly and effectively from large amount of data located in dispersed and heterogeneous federated data systems.
However, some business data problems are more complex than interactive analytics use cases, and are best broken down into a sequence of interdependent steps, a.k.a. a workflow. For these use cases, dedicated software is often required in order to schedule and manage these processes with a principled approach.
In this session, we will look at how we can leverage Apache Airflow to orchestrate Trino queries into complex workflows that solve practical batch processing problems, all the while avoiding the use of repetitive, redundant data movement.
A look under the hood of the Airflow logging subsystem
The task logging subsystem is one of most flexible, yet complex and misunderstood components of Airflow.
In this talk, we will take a look at the various task log handlers that are part of the core Airflow distribution, and dig a bit deeper in the interfaces they implement and discuss how those can be used to roll your own logging implementation.
Airflow Summit 2024 Sessionize Event
Airflow Summit 2023 Sessionize Event
Trino Summit 2022 Sessionize Event
Airflow Summit 2022 Sessionize Event
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top