Cue Integration in Categorical Tasks: Insights from Audio-Visual Speech Perception

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Previous cue integration studies have examined continuous perceptual dimensions (e.g., size) and have shown that human cue integration is well described by a normative model in which cues are weighted in proportion to their sensory reliability, as estimated from single-cue performance. However, this normative model may not be applicable to categorical perceptual dimensions (e.g., phonemes). In tasks defined over categorical perceptual dimensions, optimal cue weights should depend not only on the sensory variance affecting the perception of each cue but also on the environmental variance inherent in each task-relevant category. Here, we present a computational and experimental investigation of cue integration in a categorical audio-visual (articulatory) speech perception task. Our results show that human performance during audio-visual phonemic labeling is qualitatively consistent with the behavior of a Bayes-optimal observer. Specifically, we show that the participants in our task are sensitive, on a trial-by-trial basis, to the sensory uncertainty associated with the auditory and visual cues, during phonemic categorization. In addition, we show that while sensory uncertainty is a significant factor in determining cue weights, it is not the only one and participants' performance is consistent with an optimal model in which environmental, within category variability also plays a role in determining cue weights. Furthermore, we show that in our task, the sensory variability affecting the visual modality during cue-combination is not well estimated from single-cue performance, but can be estimated from multi-cue performance. The findings and computational principles described here represent a principled first step towards characterizing the mechanisms underlying human cue integration in categorical tasks.

Related collections

Most cited references 29

Record: found
Abstract: found
Article: not found

Bayesian integration in sensorimotor learning.

Konrad P. Kording, Daniel Wolpert (2004)

When we learn a new motor skill, such as playing an approaching tennis ball, both our sensors and the task possess variability. Our sensors provide imperfect information about the ball's velocity, so we can only estimate it. Combining information from multiple modalities can reduce the error in this estimate. On a longer time scale, not all velocities are a priori equally probable, and over the course of a match there will be a probability distribution of velocities. According to bayesian theory, an optimal estimate results from combining information about the distribution of velocities-the prior-with evidence from sensory feedback. As uncertainty increases, when playing in fog or at dusk, the system should increasingly rely on prior knowledge. To use a bayesian strategy, the brain would need to represent the prior distribution and the level of uncertainty in the sensory feedback. Here we control the statistical variations of a new sensorimotor task and manipulate the uncertainty of the sensory feedback. We show that subjects internally represent both the statistical distribution of the task and their sensory uncertainty, combining them in a manner consistent with a performance-optimizing bayesian process. The central nervous system therefore employs probabilistic models during sensorimotor learning.

0 comments Cited 538 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The ventriloquist effect results from near-optimal bimodal integration.

David Alais, David Burr (2004)

Ventriloquism is the ancient art of making one's voice appear to come from elsewhere, an art exploited by the Greek and Roman oracles, and possibly earlier. We regularly experience the effect when watching television and movies, where the voices seem to emanate from the actors' lips rather than from the actual sound source. Originally, ventriloquism was explained by performers projecting sound to their puppets by special techniques, but more recently it is assumed that ventriloquism results from vision "capturing" sound. In this study we investigate spatial localization of audio-visual stimuli. When visual localization is good, vision does indeed dominate and capture sound. However, for severely blurred visual stimuli (that are poorly localized), the reverse holds: sound captures vision. For less blurred stimuli, neither sense dominates and perception follows the mean position. Precision of bimodal localization is usually better than either the visual or the auditory unimodal presentation. All the results are well explained not by one sense capturing the other, but by a simple model of optimal combination of visual and auditory information.

0 comments Cited 480 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

An internal model for sensorimotor integration.

D. M. Wolpert, Z Ghahramani, M Jordan (1995)

On the basis of computational studies it has been proposed that the central nervous system internally simulates the dynamic behavior of the motor system in planning, control, and learning; the existence and use of such an internal model is still under debate. A sensorimotor integration task was investigated in which participants estimated the location of one of their hands at the end of movements made in the dark and under externally imposed forces. The temporal propagation of errors in this task was analyzed within the theoretical framework of optimal state estimation. These results provide direct support for the existence of an internal model.

0 comments Cited 379 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (publisher-id): plos

Journal ID (pmc): plosone

Title: PLoS ONE

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Electronic): 1932-6203

Publication date Collection: 2011

Publication date (Electronic): 26 May 2011

Volume: 6

Issue: 5

Electronic Location Identifier: e19812

Affiliations

[1]Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York, United States of America

New York University, United States of America

Author notes

* E-mail: vrao@ 123456bcs.rochester.edu

Conceived and designed the experiments: VRB MC DCK RNA. Performed the experiments: VRB. Analyzed the data: VRB. Wrote the paper: VRB MC DCK RNA.

[¤]

Current address: Department of Linguistics, School of Communication Sciences and Disorders, McGill University, Montreal, Quebec, Canada

Article

Publisher ID: PONE-D-11-01037

DOI: 10.1371/journal.pone.0019812

PMC ID: 3102664

PubMed ID: 21637344

SO-VID: be522ed1-8917-4768-9a9e-75b6777e4a21

Copyright © Bejjanki et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 5 January 2011

Date accepted : 4 April 2011

Page count

Pages: 12

Comments

Comment on this article

scite_

Cited by 21

See all cited by

Most referenced authors 214

See all reference authors

Cue Integration in Categorical Tasks: Insights from Audio-Visual Speech Perception

Read this article at

Abstract

Related collections

BIO Integration

Most cited references 29

Bayesian integration in sensorimotor learning.

The ventriloquist effect results from near-optimal bimodal integration.

An internal model for sensorimotor integration.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 12

Cited by 21

Most referenced authors 214