Marcin Szymaniuk

CEO, Data Engineers at TantusData

Berlin, Germany

Actions

Marcin is a CEO and Hands on Data Engineer at TantusData. He has a lot of hands-on experience with technical problems related to Big Data (Clusters with hundreds of nodes) as well as practical knowledge in business data analysis and Machine Learning. Companies Marcin has worked for or consulted for include: Spotify, Apple, Telia and small startups.

Area of Expertise

Business & Management
Information & Communications Technology
Physical & Life Sciences

Topics

Artificial Inteligence
Machine Learning and Artificial Intelligence
BigData and Machine Learning
BigData
Project Management
Product Owner
Product Manager

Spark Performance tuning in Spark 3 - is it still needed?

The aim is to pinpoint Spark troubleshooting and performance tuning techniques which are tricky and not well understood. Also they are relevant even in the newest versions of Spark.

Are some technical aspects of Apache Spark tricky? Are you struggling with performance or troubleshooting? Did you expect that Spark 3.x will solve all your problems? But it’s not the case?
We’ll highlight the nitty gritty details beyond the SQL. In a digestible manner. All to truly help you get your top Apache Spark issues resolved and get the most of your ecosystem. Briefly? We’ll share the top takeaways on avoiding failures from our longstanding experience with Spark. Skewed data, Cartesian join, executor fountaining - we will cover that all.

LLMs and LangChain - Getting Started Guide

Would you like to do something more than just create prompts for ChatGPT? How about building an application? An application that utilizes the power of generative AI. From scratch! All you need is Python knowledge, your laptop, and an open mind.
Sounds great? Or maybe you are afraid there is too much hype about ChatGPT? If so - that's good - we will cover the tricky bits and pitfalls as well!
Who is this course for?
The course is for people who have some programming experience, preferably in Python, who want to get started with the emerging development of ChatGPT-powered LLM applications.
Agenda (Cannot fit details because of the form 1000 character limitation):
• Introduction
• Learn how to use Chains and Agents - hands-on exercises
• Question Answering over Documents: apply ChatGPT to your own
data - hands-on exercises
• Create powerful reasoning Agents - hands-on exercise
• Summary

AI Chats: Challenges in Business Integration

Would you like to do more than just create prompts for ChatGPT?
ChatGPT’s popularity is unmatched, yet few companies successfully integrate it into systems beyond a single prompt. Integration requires specific knowledge and effort, but that’s not all. When connecting a production system with a ChatGPT-like tool, you must consider data privacy and costs. Analyzing what data you send, where it’s processed, and the cost implications is crucial for defining a business case with a solid return on investment.

In this presentation, I’ll walk you through key challenges with LLMs. I’ll cover direct GPT API integration, cost calculation, and optimization strategies. Additionally, I’ll discuss privately hosted models and how to tune them pragmatically—because most companies don’t have the resources of Google or Microsoft.

DataFrames in Spark - the analysts perspective.

Are you a data analyst who works with Spark and often gets confused by failures you don’t understand? Have you seen a bunch of presentations or blog posts about Spark performance but you are still not certain how to apply the hints you have been given in practice?

Spark is commonly used by people who are not experts in programming but they know SQL and sometimes basic Python. They treat Spark as a tool for getting business value from the the data. And that is how it should be! Although it’s common that queries they run do not work for any obvious reason. This talk is designed for such Spark users and will be focused on common problems with Spark (especially DataFrames and SQL) which can be solved by anyone familiar with SQL. You don’t need to read bytecode to understand the techniques presented and apply them in practice!
This talk will be a case study of multiple DataFrame queries in Spark which initially do not work. I will not only explain how to fix them, but we will go through the solution step-by-step so you will learn what to pay attention to and how to apply similar techniques to your codebase!

Go big or go … well not one too many. Aka applying machine learning in production.

We’ll quickly define a ‘model on production’. There is a myriad of definitions people use right now. A variance completely justified. Because the exact way we do define it depends. Among other factors on: the size of the company, number of models, properties of the data, and so on.
While we’re at it we’ll also answer some pinning questions such as: Is it ever OK to duct-tape the model deployment process? What about using shortcuts and opting for some manual work? Pithy answers. We’ll pause to take on any questions. If there will be too many to fit, we’ll provide contact information and make sure to address all enquiries after the presentation.

Now practice makes perfect. Or nearer perfection at least. So we’ll briefly present two cases of solutions solving the same problem for two clients, both implemented with a vast difference. We’ll explain why. Both solutions required optimising search engine for results that translate into higher revenue. Yet, the companies are on opposite sides of the scale. One a huge, mature retailer, another a much younger and smaller online-booking business.
Cost must always be justifiable. So, using these cases we’ll succinctly show how to fit a solution to match the context of the organisation. How to utilise it well. Next, quickly going through some details of the models and infrastructures, we’ll explain the reasoning behind the critical decisions as well as highlight pros and cons of both approaches.

Marcin Szymaniuk

CEO, Data Engineers at TantusData

Berlin, Germany

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Speaker

Marcin Szymaniuk

Actions

Links

Area of Expertise

Topics

Sessions

Spark Performance tuning in Spark 3 - is it still needed?

LLMs and LangChain - Getting Started Guide

AI Chats: Challenges in Business Integration

DataFrames in Spark - the analysts perspective.

Go big or go … well not one too many. Aka applying machine learning in production.

Marcin Szymaniuk

Links

Actions