Session

Want a cup of Java? API and Data Mocking for Python projects

When working in embedded and observability domains, I’ve used Python scripting to retrieve and pre-process data from external sources, and one of the issues I’ve seen is the difficulty in reliably testing data pipelines against external services: API limits and pay-per-use costs, service outages, etc, etc. So, can we model (aka “mock”) the services to reliably test our data ingestion pipelines? Sure we can!

In this talk, I will show a few ways to build test services, databases, and API providers with the help of Testcontainers and WireMock available on Python, thanks to the container technology. Then, we will extend the approach by adding the generation of fake data with the help of Faker libraries or Synthesized that can be used for both relational data and data sequences.

Outline:

1. What is a non-Python non-developer doing here?
2. APIs in Data Pipelines, and the problem of integration testing
3. Shifting left integration tests with API modeling
4. Introduction to Testcontainers and WireMock, with examples for unittest and Robot Framework
5. Capturing and randomizing data with WireMock
Mocking data with Data Faker libraries
6. Example for Pytest, with Dev Containers
7. Advanced vector and relational data randomization with Synthesized
8. My wishlist for Python tooling

The talk was presented first time at PuData Lausanne in November 2023

Oleg Nenashev

Community Builder, CNCF Ambassador, Jenkins core maintainer

Neuchâtel, Switzerland

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top