DMLR: Data-centric Machine Learning Research--Past, Present and Future

Abstract: 

Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods towards positive scientific, societal and business impact.

Author: 
Luis Oala
Manil Maskey
Lilith Bat-Leah
Alicia Parrish
Nezihe Merve Gürel
Tzu-Sheng Kuo
Yang Liu
Rotem Dror
Danilo Brajovic
Xiaozhe Yao
Max Bartolo
William A Gaviria Rojas
Ryan Hileman
Rainier Aliment
Michael W Mahoney
Meg Risdal
Matthew Lease
Wojciech Samek
Debojyoti Dutta
Curtis G Northcutt
Cody Coleman
Braden Hancock
Bernard Koch
Girmaw Abebe Tadesse
Bojan Karlaš
Adji Bousso Dieng
Natasha Noy
Vijay Janapa Reddi
James Zou
Praveen Paritosh
Mihaela van der Schaar
Kurt Bollacker
Lora Aroyo
Ce Zhang
Joaquin Vanschoren
Isabelle Guyon
Peter Mattson
Publication date: 
May 28, 2024
Publication type: 
Journal of Data-centric Machine Learning Research (DMLR)