Dr. Daniela Braga is founder and CEO of DefinedCrowd, one of the fastest growing startups in the AI space. With eighteen years working in Speech Technology both in academia and industry in Portugal, Spain, China, and the US, Dr. Braga has deep expertise in Speech Science and is one the world leaders of Crowdsourcing adoption in large enterprises. Previously at Microsoft worked in pretty much all stacks of Speech Technology and shipped 26 languages for Exchange 14, 10 TTS voices in Windows 8 and was involved in Cortana. At Voicebox Technologies, Dr. Braga created the Data Science team and shipped voice-enabled products for clients like Samsung and Toyota, introduced Crowdsourcing for big data solutions and re-structured the Engineering infrastructure around data collection, processing, ingestion, instrumentation, and discoverability. Dr. Braga is oftentimes guest lecturer in the University of Washington, USA, is the author of more than 90 scientific papers and several patents.
Artificial Intelligence needs enormous amounts of data to be trained. Currently, it is estimated that 15-20% of data used by data scientists is garbage and 80% of their time is spent scrubbing and cleaning the data. This means high-quality data is often hard to obtain, expensive and hard to scale to new markets. But one of the emerging challenges when it comes to data are the biased datasets due to the usage of wrong/disproportional sources and gaps on the data collection. There’s also a human intervention when it comes to data collection and building algorithms so they will, consciously or unconscientiously, reflect preconceptions, including cultural and socioeconomical backgrounds. In this talk and demo, we will discuss how to reduce bias in AI and how to get bias-free datasets that at the end will turn out into high-quality training data.