Session

Copycats and the Commons: Governing Open Data for Trustworthy AI

"
The proliferation of open data on Community-Contributed Platforms (CCPs) is often championed as a democratizing force for AI innovation. In high-stakes domains like healthcare, this narrative of data as an inherent public good (Zuiderwijk & Janssen, 2014) obscures the complex socio-technical and legal realities of data governance. This talk critically examines the lifecycle of publicly available medical imaging (MI) datasets, investigating the ""copycat"" phenomenon—the uncontrolled duplication and modification of datasets. Our research reveals systemic governance failures, including vague licenses (Longpre et al., 2023), missing persistent identifiers, and the loss of critical metadata. We argue these are symptoms of a model that ignores the essential, ongoing ""data work"" (Sambasivan et al., 2021) and stewardship required to maintain a healthy data ecosystem (Avlona 2025, forthcoming), (Jiménez-Sánchez et al., 2024).

While normative frameworks like the FAIR principles (Wilkinson et al., 2016) and documentation standards (Gebru et al., 2021) provide valuable guidance, their implementation is a resource-intensive achievement, not a default state. The current CCP ecosystem relies on this hidden labor without providing the necessary infrastructure or incentives to support it, leading to a deterioration of data as a common good and posing significant risks—such as data cascades (Sambasivan et al., 2021)—for the development of safe and equitable AI (Avlona, 2025, forthcoming). This talk concludes by proposing a shift from ""passive"" open data hosting to active, Commons-based governance (Hess & Ostrom, 2003; Purtova, 2015). We advocate for sustainable stewardship models to make the work of maintaining data quality visible, accountable, and properly resourced, thereby ensuring the public value of open data is actively produced and protected (Jiménez-Sánchez et al., 2024).

"

Natalia-Rozalia Avlona

University of Copenhagen, Postdoctoral Researcher & lawyer

Actions

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Jump to top