Judging facts, judging norms: Training machine learning models to judge humans requires a modified approach to labeling data
Published in Science Advances, 2023
We find that using factual labels to train models intended for normative judgments introduces a notable measurement error and models trained using factual labels yield significantly different judgments than those trained using normative labels such that the impact of this effect on model performance can exceed that of other factors (e.g., dataset size) that routinely attract attention from ML researchers and practitioners.
Recommended citation: Aparna Balagopalan, David Madras, David H Yang, Dylan Hadfield-Menell, Gillian K Hadfield, Marzyeh Ghassemi. Judging facts, judging norms: Training machine learning models to judge humans requires a modified approach to labeling data. Sci. Adv. 9, eabq0701 (2023). DOI:10.1126/sciadv.abq0701
Download Paper