Jodie Burchell

Developer Advocate in Data Science

Berlin, Germany

Actions

Dr. Jodie Burchell is the Developer Advocate in Data Science at JetBrains, and was previously a Lead Data Scientist at Verve Group Europe. She completed a PhD in clinical psychology and a postdoc in biostatistics, before leaving academia for a data science career. She has worked for 9 years as a data scientist in both Australia and Germany, developing a range of products including recommendation systems, analysis platforms, search engine improvements and audience profiling. She has held a broad range of responsibilities in her career, doing everything from data analytics to maintaining machine learning solutions in production. She is a long time content creator in data science, across conference and user group presentations, books, webinars, and posts on both her own and JetBrain's blogs.

Badges

Area of Expertise

Humanities & Social Sciences
Information & Communications Technology

Topics

Data Science
Machine Learning
python
Data Visualization
Data Science & AI
Big Data

Separating fact from fiction in a world of AI fairytales

If you've been remotely tuned in to the developments in generative AI over the past year, you've likely been inundated with news, ranging from claims that these models will replace numerous white-collar jobs to declarations of sentience and an impending AI apocalypse. At this stage, the hype surrounding AI has far surpassed the actual useful information available.

In this presentation, we’ll cut through the noise and delve deep into the current applications, risks, and limitations of these generative AI models. We will start with the early research endeavours aimed at creating an "artificial brain" and trace the path that has led us to today's sophisticated models. Along the way, we will address the misconception of mistaking these models for intelligent systems and shed light on the actual requirements for developing true artificial general intelligence, and see how far we seem to be from this goal. Moreover, we will highlight how an excessive focus on topics like the sentience of these systems has overshadowed the genuine issues associated with these models. By shifting our attention towards their real limitations, we will see how we can better maximise the potential of these exciting models.

Why humans are still essential for machine learning

Models made such astounding leaps in the past year in the areas of text and image generation that some believe that AI can now learn and create meaningful outputs independent of human intervention. It has even been claimed that models like LaMDA and ChatGPT demonstrate artificial general intelligence, or that they can fully replace the work of human designers or writers.

However, if you scratch the surface, such models still rely heavily on human intervention at every stage. In this talk, we’ll cover three tasks which cannot yet be automated since they rely on human judgment and expertise.

Firstly, we’ll go over how human intervention is needed when selecting and screening data to train these models, especially when it comes to spotting and removing bias or other quality issues. Secondly, we’ll talk about how people are needed to assess the ethical implications of models prior to their creation and deployment, from the consent to use the training data to whether certain models should be created at all. Finally, we’ll discuss how only human judgment can determine the most feasible uses of these models, and cover some real-world examples of where their application has been more, and less, successful.

Text to … vectors? How feature engineering works in natural language processing

Do you have an interest in starting your own natural language processing project, but feel overwhelmed by all the talk of attention-based models and text embeddings? Would you like to understand how you can transform a set of texts into features for a model? In this talk, I'll give you a practical demonstration of how meaningful features are created from text data, going from the simplest approaches and working up to cutting edge techniques such as BERT. I’ll demonstrate how to do this using some of the most popular Python packages for NLP, including scikit-learn, nltk, gensim and transformers. At each step, we'll discuss why each technique works, what meaning it extracts from the text and what it leaves behind, and the advantages and disadvantages of using each.

Vectorise all the things! How basic linear algebra can speed up your data science code

Have you found that your data science code works beautifully on a few dozen test rows, but leaves you wondering how to spend the next couple of hours after you start looping through your full data set? Are you only familiar with Python, and wish there was a way to speed things up without subjecting yourself to learning C? In this talk, I will show you some simple tricks, borrowed from linear algebra, which can give you significant performance gains in your Python data science code. I will gently take you through the basics of linear algebra, explaining core operations such as matrix addition, subtraction and multiplication, scalar multiplication and the Hadamard power. I will then show you some examples of how you can easily utilise these concepts in your machine learning code to speed up common data science operations such as distance calculations, classification tasks and finding nearest neighbours.

Garbage data in, garbage models out: how to select the right data for robust machine learning

A lot of focus in machine learning is on the predictive performance of models, with articles touting how new accuracy benchmarks are being broken by cutting edge deep learning or gradient boosting algorithms. However, what if I told you that machine learning's dirty secret is that models are only as good as the data you feed into them? In this talk, I'll describe common pitfalls when creating a dataset for machine learning work, and how to spot and avoid them. We'll talk about how to make sure the fields you've selected are actually measuring what you think they're measuring, how to make sure that the data you've selected will actually make a good model, how to make sure your model will be able to generalise to new data, and how to design experiments that give you the results you want. This talk will show these techniques in a language-agnostic manner so that you can apply them when working in your language or framework of choice.

Jodie Burchell

Developer Advocate in Data Science

Berlin, Germany

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Most Active Speaker

Jodie Burchell

Actions

Links

Badges

Area of Expertise

Topics

Sessions

Separating fact from fiction in a world of AI fairytales

Why humans are still essential for machine learning

Text to … vectors? How feature engineering works in natural language processing

Vectorise all the things! How basic linear algebra can speed up your data science code

Garbage data in, garbage models out: how to select the right data for robust machine learning

Jodie Burchell

Links

Actions