ScienceOpen: research and publishing network

For Researchers

Search
Advanced search

1

views

    

0

recommends

0

shares

Record: found
Abstract: found
Article: not found

From Human to Data to Dataset: Mapping the Traceability of Human Subjects in Computer Vision Datasets

Author(s): Morgan Klaus Scheuerman ¹ , Katy Weathington ¹ , Tarun Mugunthan ² , Emily Denton ³ , Casey Fiesler ¹

Publication date (Electronic): April 16 2023

Journal: Proceedings of the ACM on Human-Computer Interaction

Publisher: Association for Computing Machinery (ACM)

Read this article at

ScienceOpenPublisher

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Computer vision is a "data hungry" field. Researchers and practitioners who work on human-centric computer vision, like facial recognition, emphasize the necessity of vast amounts of data for more robust and accurate models. Humans are seen as a data resource which can be converted into datasets. The necessity of data has led to a proliferation of gathering data from easily available sources, including "public" data from the web. Yet the use of public data has significant ethical implications for the human subjects in datasets. We bridge academic conversations on the ethics of using publicly obtained data with concerns about privacy and agency associated with computer vision applications. Specifically, we examine how practices of dataset construction from public data-not only from websites, but also from public settings and public records-make it extremely difficult for human subjects to trace their images as they are collected, converted into datasets, distributed for use, and, in some cases, retracted. We discuss two interconnected barriers current data practices present to providing an ethics of traceability for human subjects: awareness and control. We conclude with key intervention points for enabling traceability for data subjects. We also offer suggestions for an improved ethics of traceability to enable both awareness and control for individual subjects in dataset curation practices.

Related collections

Most cited references 85

Record: found
Abstract: not found
Article: not found

ImageNet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher … (2010)

0 comments Cited 1946 times – based on 0 reviews      Review now

Record: found
Abstract: not found
Conference Proceedings: not found

Membership Inference Attacks Against Machine Learning Models

Reza Shokri, Marco Stronati, Congzheng Song … (2017)

0 comments Cited 225 times – based on 0 reviews

Record: found
Abstract: not found
Article: not found

Content Analysis: A Flexible Methodology

Marilyn White, Emily Marsh (2006)

0 comments Cited 146 times – based on 0 reviews      Review now

Author and article information

Contributors

Morgan Klaus Scheuerman: (View ORCID Profile)

Katy Weathington: (View ORCID Profile)

Tarun Mugunthan: (View ORCID Profile)

Emily Denton: (View ORCID Profile)

Casey Fiesler: (View ORCID Profile)

Journal

Title: Proceedings of the ACM on Human-Computer Interaction

Abbreviated Title: Proc. ACM Hum.-Comput. Interact.

Publisher: Association for Computing Machinery (ACM)

ISSN (Electronic): 2573-0142

Publication date Created: April 14 2023

Publication date (Electronic): April 16 2023

Publication date (Print): April 14 2023

Volume: 7

Issue: CSCW1

Pages: 1-33

Affiliations

[1 ]University of Colorado Boulder, Boulder, CO, USA

[2 ]University of California Berkely, Berkeley, CA, USA

[3 ]Google, New York, NY, USA

Article

DOI: 10.1145/3579488

SO-VID: ecf0282e-b0a6-4ca5-9361-06f945a3efad

Copyright © © 2023

License:

http://www.acm.org/publications/policies/copyright_policy#Background

History

Data availability:

Comments

Comment on this article