Midi de la recherche SIO avec Jeffrey Parsons
Date 19 mai 2023
Heure 12h30 à 14h HAE
Lieu Sur place ou en ligne
Salle Ludger St-Pierre (1307)
Pavillon Palasis-Prince
Événement gratuit
À propos de
l'événement
Le Département de systèmes d’information organisationnels vous invite à une présentation de Jeffrey Parsons, de la Memorial University, qui portera sur son article Training-induced bias in crowdsourced data: Evidence and implications for machine learning.
La présentation se déroulera en anglais.
Une boîte à lunch sera offerte gratuitement aux personnes présentes. L’événement sera également webdiffusé en direct. Inscrivez-vous pour obtenir le lien pour le visionner.
Conférencier

Jeffrey Parsons
Professeur
Memorial University
Résumé
Crowdsourcing is a popular way to collect data outside traditional organizational boundaries. A popular crowdsourcing strategy is training contributors in a specific data collection task, such as entity classification. However, as people learn the characteristics of an entity needed to identify it, selective attention theory suggests they will focus on only those characteristics and ignore other observable characteristics. This behaviour can induce bias in the data. We investigate the effect of training-induced data collection bias on the quality of supervised and unsupervised data classifications in human-machine systems.
A sample of undergraduate students randomly assigned to three experimental groups—explicitly trained, implicitly trained, and untrained—participated in a classification task. Drawing on the theory of selective attention, we hypothesize that untrained contributors will report less biased data than trained contributors and these data, when classified using machine learning, will generate purer clusters than data provided by trained contributors. In addition, we expect that interpretations of clustering results from data provided by untrained contributors will be more congruent with the original observation than interpretations obtained from data provided by trained contributors.
Our results show that the data from explicitly trained contributors is the most biased, leads to the most impure classifications, and is the least interpretable. In contrast, data from the untrained group is the least biased, leads to the purest classifications, and is better aligned with the original observations than data from the untrained group. We discuss implications of our findings for understanding the impact of bias in human-machine systems.