Garbage data in, garbage models out: how to select the right data for robust machine learning

A lot of focus in machine learning is on the predictive performance of models, with articles touting how new accuracy benchmarks are being broken by cutting edge deep learning or gradient boosting algorithms. However, what if I told you that machine learning's dirty secret is that models are only as good as the data you feed into them? In this talk, I'll describe common pitfalls when creating a dataset for machine learning work, and how to spot and avoid them. We'll talk about how to make sure the fields you've selected are actually measuring what you think they're measuring, how to make sure that the data you've selected will actually make a good model, how to make sure your model will be able to generalise to new data, and how to design experiments that give you the results you want. This talk will show these techniques in a language-agnostic manner so that you can apply them when working in your language or framework of choice.

Jodie Burchell

Developer Advocate in Data Science

Berlin, Germany

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

Garbage data in, garbage models out: how to select the right data for robust machine learning

Jodie Burchell

Links

Actions