Session

DevOps for the humble data engineer

A data engineer stores data in warehouses, lakehouses or even a simple database. As if we had to tell you that, that's the natural order of things. But there are other things a data engineer needs that do not belong in that kind of storage. Where do the SQL scripts, the workbooks, or the little python program go?

If you don't know better, your local hard drive. If you are very bold, you might store them in Microsoft SharePoint. Or what's good enough for your data is good enough for your scripts, after all this saves on storage costs. Luckily, there is a better option: Git.

The scripts and workbooks are stored in a Git repository. Git is a distributed version control system initially developed for the Linux kernel. It allows for easily versioning your files and for collaboration on the same workbook. Git is a tool that has many amazing features, but like any tool it requires knowledge to use. Make your first steps into Git as a data engineer here and learn what Git can do for you when you develop a data solution.

Ok, now that your stored procedure SQL script and your Databricks/Fabric/Synapse workbook is fully versioned with Git, how do get it from the Git repository onto the SQL database or the Databricks workspace? Of course you could copy it manually, but wouldn't it be nicer if it just deployed automagically whenever a change occurs? The DevOps world has a solution: Pipelines. Learn how a pipeline can take your workbook and deploy it to dev, staging, and prod environments, adapting connection string and other parameters to match each environment automatically. Pipelines are also a fully integrated part of Microsoft Fabric, we will use for a practical demonstration.

Git and DevOps pipelines have helped software engineering immensely. And it might just do the same for data. At the very least they might just ease your daily life. Isn't that possibility worth knowing more about Git and Pipelines?

Marisol Steinau

Data Solution Architect

Tuttlingen, Germany

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top