
Shubham Raj
Software Engineer II at Cloudera
Bengaluru, India
Actions
Shubham Raj, a Software Engineer II at Cloudera, is a full-stack engineer building a data engineering platform on Kubernetes in a private cloud environment.
An Apache Airflow committer, he enjoys sharing knowledge and actively learning from the open-source community.
Links
Area of Expertise
Topics
Overcoming Custom Python Package Hurdles in Airflow
DAG Authors, while constructing DAGs, generally use native libraries provided by Airflow in conjunction with python libraries available over public PyPI repositories.
But sometimes, DAG authors need to construct DAG using libraries that are either in-house or not available over public PyPI repositories. This poses a serious challenge for users who want to run their custom code with Airflow DAGs, particularly when Airflow is deployed in a cloud-native fashion.
Traditionally, these packages are baked in Airflow Docker images. This won’t work post deployment and is super impractical if your library is under development.
We propose a solution that creates a dedicated Airflow global python environment that dynamically generates the requirements, establishes a version-compatible pyenv adhering to Airflow’s policies, and manages custom pip repository authentication seamlessly. Importantly, the service executes these steps in a fail-safe manner, not compromising core components.
Join us as we discuss the solution to this common problem, touching upon the design, and seeing the solution in action. We also candidly discuss some challenges, and the shortcomings of the proposed solution.
Sketching Pipelines using DAG Authoring UI
Cloudera Data Engineering (CDE) is a serverless service for Cloudera Data Platform that allows you to submit various Spark jobs and Airflow DAGs to an auto-scaling cluster.
Running your workloads as Python DAG files may be the usual, but not the most convenient way for some users as it involves a lot of background around syntaxes, the programming language, aesthetics of Airflow, etc.
The DAG Authoring UI is a tool built on top of Airflow APIs to allow one to use a graphical user interface to create, manage, and destroy complex DAGs. The DAG authoring UI will give one the ability to perform tasks on Airflow without really having to know DAG structure, Python programming language, and the internals of Airflow.
CDE has identified multiple operators to perform various tasks on Airflow by carefully categorising the use cases. The operators range from BashOperator, PythonOperator, CDEJobRunOperator, CDWJobRunOperator
Most use cases can be run as combinations of the operators provided.
Airflow Summit 2024 Sessionize Event
Airflow Summit 2023 Sessionize Event
Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.
Jump to top