18
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Target Speaker Detection with Concealed EEG Around the Ear

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Target speaker identification is essential for speech enhancement algorithms in assistive devices aimed toward helping the hearing impaired. Several recent studies have reported that target speaker identification is possible through electroencephalography (EEG) recordings. If the EEG system could be reduced to acceptable size while retaining the signal quality, hearing aids could benefit from the integration with concealed EEG. To compare the performance of a multichannel around-the-ear EEG system with high-density cap EEG recordings an envelope tracking algorithm was applied in a competitive speaker paradigm. The data from 20 normal hearing listeners were concurrently collected from the traditional state-of-the-art laboratory wired EEG system and a wireless mobile EEG system with two bilaterally-placed around-the-ear electrode arrays (cEEGrids). The results show that the cEEGrid ear-EEG technology captured neural signals that allowed the identification of the attended speaker above chance-level, with 69.3% accuracy, while cap-EEG signals resulted in the accuracy of 84.8%. Further analyses investigated the influence of ear-EEG signal quality and revealed that the envelope tracking procedure was unaffected by variability in channel impedances. We conclude that the quality of concealed ear-EEG recordings as acquired with the cEEGrid array has potential to be used in the brain-computer interface steering of hearing aids.

          Related collections

          Most cited references26

          • Record: found
          • Abstract: found
          • Article: not found

          How about taking a low-cost, small, and wireless EEG for a walk?

          To build a low-cost, small, and wireless electroencephalogram (EEG) system suitable for field recordings, we merged consumer EEG hardware with an EEG electrode cap. Auditory oddball data were obtained while participants walked outdoors on university campus. Single-trial P300 classification with linear discriminant analysis revealed high classification accuracies for both indoor (77%) and outdoor (69%) recording conditions. We conclude that good quality, single-trial EEG data suitable for mobile brain-computer interfaces can be obtained with affordable hardware. Copyright © 2012 Society for Psychophysiological Research.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Reconstructing Speech from Human Auditory Cortex

            Introduction The early auditory system decomposes speech and other complex sounds into elementary time-frequency representations prior to higher level phonetic and lexical processing [1]–[5]. This early auditory analysis, proceeding from the cochlea to the primary auditory cortex (A1) [1]–[3],[6], yields a faithful representation of the spectro-temporal properties of the sound waveform, including those acoustic cues relevant for speech perception, such as formants, formant transitions, and syllable rate [7]. However, relatively little is known about what specific features of natural speech are represented in intermediate and higher order human auditory cortex. In particular, the posterior superior temporal gyrus (pSTG), part of classical Wernicke's area [8], is thought to play a critical role in the transformation of acoustic information into phonetic and pre-lexical representations [4],[5],[9],[10]. PSTG is believed to participate in an “intermediate” stage of processing that extracts spectro-temporal features essential for auditory object recognition and discards nonessential acoustic features [4],[5],[9]–[11]. To investigate the nature of this auditory representation, we directly quantified how well different stimulus representations account for observed neural responses in nonprimary human auditory cortex, including areas along the lateral surface of STG. One approach, referred to as stimulus reconstruction [12]–[15], is to measure population neural responses to various stimuli and then evaluate how accurately the original stimulus can be reconstructed from the measured responses. Comparison of the original and reconstructed stimulus representation provides a quantitative description of the specific features that can be encoded by the neural population. Furthermore, different stimulus representations, referred to as encoding models, can be directly compared to test hypotheses about how the neural population represents auditory function [16]. In this study, we focus on whether important spectro-temporal auditory features of spoken words and continuous sentences can be reconstructed from population neural responses. Because significant information may be transformed or lost in the course of higher order auditory processing, an exact reconstruction of the physical stimulus is not expected. However, analysis of stimulus reconstruction can reveal the key auditory features that are preserved in the temporal cortex representation of speech. To investigate this, we analyzed multichannel electrode recordings obtained from the surface of human auditory cortex and examined the extent to which these population neural signals could be used for reconstruction of different auditory representations of speech sounds. Results Words and sentences from different English speakers were presented aurally to 15 patients undergoing neurosurgical procedures for epilepsy or brain tumor. All patients in this study had normal language capacity as determined by neurological exam. Cortical surface field potentials were recorded from non-penetrating multi-electrode arrays placed over the lateral temporal cortex (Figure 1, red circles), including the pSTG. We investigated the nature of auditory information contained in temporal cortex neural responses using a stimulus reconstruction approach (see Materials and Methods) [12]–[15]. The reconstruction procedure is a multi-input, multi-output predictive model that is fit to stimulus-response data. It constitutes a mapping from neural responses to a multi-dimensional stimulus representation (Figures 1 and 2). This mapping can be estimated using a variety of different learning algorithms [17]. In this study a regularized linear regression algorithm was used to minimize the mean-square error between the original and reconstructed stimulus (see Materials and Methods). Once the model was fit to a training set, it could then be used to predict the spectro-temporal content of any arbitrary sound, including novel speech not used in training. 10.1371/journal.pbio.1001251.g001 Figure 1 Experiment paradigm. Participants listened to words (acoustic waveform, top left), while neural signals were recorded from cortical surface electrode arrays (top right, red circles) implanted over superior and middle temporal gyrus (STG, MTG). Speech-induced cortical field potentials (bottom right, gray curves) recorded at multiple electrode sites were used to fit multi-input, multi-output models for offline decoding. The models take as input time-varying neural signals at multiple electrodes and output a spectrogram consisting of time-varying spectral power across a range of acoustic frequencies (180–7,000 Hz, bottom left). To assess decoding accuracy, the reconstructed spectrogram is compared to the spectrogram of the original acoustic waveform. 10.1371/journal.pbio.1001251.g002 Figure 2 Spectrogram reconstruction. (A) Top: spectrogram of six isolated words (deep, jazz, cause) and pseudowords (fook, ors, nim) presented aurally to an individual participant. Bottom: spectrogram-based reconstruction of the same speech segment, linearly decoded from a set of electrodes. Purple and green bars denote vowels and fricative consonants, respectively, and the spectrogram is normalized within each frequency channel for display. (B) Single trial high gamma band power (70–150 Hz, gray curves) induced by the speech segment in (A). Recordings are from four different STG sites used in the reconstruction. The high gamma response at each site is z-scored and plotted in standard deviation (SD) units. Right panel: frequency tuning curves (dark black) for each of the four electrode sites, sorted by peak frequency and normalized by maximum amplitude. Red bars overlay each peak frequency and indicate SEM of the parameter estimate. Frequency tuning was computed from spectro-temporal receptive fields (STRFs) measured at each individual electrode site. Tuning curves exhibit a range of functional forms including multiple frequency peaks (Figures S1B and S2B). (C) The anatomical distribution of fitted weights in the reconstruction model. Dashed box denotes the extent of the electrode grid (shown in Figure 1). Weight magnitudes are averaged over all time lags and spectrogram frequencies and spatially smoothed for display. Nonzero weights are largely focal to STG electrode sites. Scale bar is 10 mm. The key component in the reconstruction algorithm is the choice of stimulus representation, as this choice encapsulates a hypothesis about the neural coding strategy under study. Previous applications of stimulus reconstruction in non-human auditory systems [14],[15] have focused primarily on linear models to reconstruct the auditory spectrogram. The spectrogram is a time-varying representation of the amplitude envelope at each acoustic frequency (Figure 1, bottom left) [18]. The spectrogram envelope of natural sounds is not static but rather fluctuates across both frequency and time [19]–[21]. Envelope fluctuations in the spectrogram are referred to as modulations [18]–[22] and play an important role in the intelligibility of speech [19],[21]. Temporal modulations occur at different temporal rates and spectral modulations occur at different spectral scales. For example, slow and intermediate temporal modulation rates ( 16 Hz) correspond to syllable onsets and offsets. Similarly, broad spectral modulations relate to vowel formants while narrow spectral structure characterizes harmonics. In the linear spectrogram model, modulations are represented implicitly as the fluctuations of the spectrogram envelope. Furthermore, neural responses are assumed to be linearly related to the spectrogram envelope. For stimulus reconstruction, we first applied the linear spectrogram model to human pSTG responses using a stimulus set of isolated words from an individual speaker. We used a leave-one-out cross-validation fitting procedure in which the reconstruction model was trained on stimulus-response data from isolated words and evaluated by directly comparing the original and reconstructed spectrograms of the out-of-sample word. Reconstruction accuracy is quantified as the correlation coefficient (Pearson's r) between the original and reconstructed stimulus. The reconstruction procedure is illustrated in Figure 2 for one participant with a high-density (4 mm) electrode grid placed over posterior temporal cortex. For different words, the linear model yielded accurate spectrogram reconstructions at the level of single trial stimulus presentations (Figure 2A and B; see Figure S7 and Supporting Audio File S1 for example audio reconstructions). The reconstructions captured major spectro-temporal features such as energy concentration at vowel harmonics (Figure 2A, purple bars) and high frequency components during fricative consonants (Figure 2A, [z] and [s], green bars). The anatomical distribution of weights in the fitted reconstruction model revealed that the most informative electrode sites within temporal cortex were largely confined to pSTG (Figure 2C). Across the sample of participants (N = 15), cross-validated reconstruction accuracy for single trials was significantly greater than zero in all individual participants (p 2.5, p 0) was largely confined to neural responses in the high gamma band (∼70–170 Hz; Figure 4B; p 8 Hz) (p 8 Hz; r = ∼0.10; one-sample paired t tests, p>0.05, df = 14, Bonferroni correction), indicating a lack of reliable envelope-locking to rapid temporal fluctuations [31]. 10.1371/journal.pbio.1001251.g005 Figure 5 Comparison of linear and nonlinear coding of temporal fluctuations. (A) Mean reconstruction accuracy (r) as a function of temporal modulation rate, averaged over all participants (N = 15). Modulation-based decoding accuracy (red curve) is higher compared to spectrogram-based decoding (blue curve) for temporal rates ≥4 Hz. In addition, spectrogram-based decoding accuracy is significantly greater than zero for lower modulation rates (≤8 Hz), supporting the possibility of a dual modulation and envelope-based coding scheme for slow modulation rates. Shaded gray regions indicate SEM over participants. (B) Mean ensemble rate tuning curve across all predictive electrode sites (n = 195). Error bars indicate SEM. Overlaid histograms indicate proportion of sites with peak tuning at each rate. (C) Within-site differences between modulation and spectrogram-based tuning. Arrow indicates the mean difference across sites. Within-site, nonlinear modulation models are tuned to higher temporal modulation rates than the corresponding linear spectrogram models (p 8 Hz) (Figure 5B and C). In contrast, the average modulation-based tuning curves preserve sensitivity to much higher rates approaching 32 Hz (Figure 5B and C). Sensitivity to fast modulation rates at single STG electrodes is illustrated for one participant in Figure 7A. In this example (the word “waldo”), the spectrogram envelope (blue curve, top) fluctuates rapidly between the two syllables (“wal” and “do,” ∼300 ms). The linear model assumes that neural responses (high gamma power, black curves, left) are envelope-locked and directly track this rapid change. However, robust tracking of such rapid envelope changes was not generally observed, in violation of linear model assumptions. This is illustrated for several individual electrodes in Figure 7A (compare black curves, left, with blue curve, top). In contrast, the modulation representation encodes this fluctuation nonlinearly as an increase in energy at fast rates (>8 Hz, dashed red curves, ∼300 ms, bottom two rows). This allows the model to capture energy-based modulation information in the neural response. Modulation energy encoding at these sites is quantified by the corresponding nonlinear rate tuning curves (Figure 7A, right column). These tuning curves show neural sensitivity to a range of temporal modulations with a single peak rate. For illustrative purposes, Figure 7A (left) compares modulation energy at the peak temporal rate (dashed red curves) with the neural responses (black curves) at each individual site. This illustrates the ability of the modulation model to account for a rapid decrease in the spectrogram envelope without a corresponding decrease in the neural response. 10.1371/journal.pbio.1001251.g007 Figure 7 Example of nonlinear modulation coding and reconstruction. (A) Top: the spectrogram of an isolated word (“waldo”) presented aurally to one participant. Blue curve plots the spectrogram envelope, summed over all frequencies. Left panels: induced high gamma responses (black curves, trial averaged) at four different STG sites. Temporal modulation energy of the stimulus (dashed red curves) is overlaid (computed from 2, 4, 8, and 16 Hz modulation filters and normalized to maximum value). Dashed black lines indicate baseline response level. Right panels: nonlinear modulation rate tuning curves for each site (estimated from nonlinear STRFs). Shaded regions and error bars indicate SEM. (B) Original spectrogram (top), modulation-based reconstruction (middle), and spectrogram-based reconstruction (bottom), linearly decoded from a fixed set of STG electrodes. The modulation reconstruction is projected into the spectrogram domain using an iterative projection algorithm and an overcomplete set of modulation filters [18]. The displayed spectrogram is averaged over 100 random initializations of the algorithm. The effect of sensitivity to fast modulation rates can also be observed when the modulation reconstruction is viewed in the spectrogram domain (Figure 7B, middle, see Material and Methods, Reconstruction Accuracy). The result is that dynamic spectral information (such as the upward frequency sweep at ∼400–500 ms, Figure 7B, top) is better resolved compared to the linear spectrogram-based reconstruction (Figure 7B, bottom). These combined results support the idea of an emergent population-level representation of temporal modulation energy in primate auditory cortex [37]. In support of this notion, subpopulations of neurons have been found that exhibit both envelope and energy-based response properties in primary auditory cortex of non-human primates [37]–[39]. This has led to the suggestion of a dual coding scheme in which slow fluctuations are encoded by synchronized (envelope-locked) neurons, while fast fluctuations are encoded by non-synchronized (energy-based) neurons [37]. While these results indicate that a nonlinear model is required to reliably reconstruct fast modulation rates, psychoacoustic studies have shown that slow and intermediate modulation rates (∼1–8 Hz) are most critical for speech intelligibility [19],[21]. These slow temporal fluctuations carry essential phonological information such as formant transitions and syllable rate [7],[19],[21]. The linear spectrogram model, which also yielded good performance within this range (Figure 5A; Figure S3), therefore appears sufficient to reconstruct the essential range of temporal modulations. To examine this issue, we further assessed reconstruction quality by evaluating the ability to identify isolated words using the linear spectrogram reconstructions. We analyzed a participant implanted with a high-density electrode grid (4 mm spacing), the density of which provided a large set of pSTG electrodes. Compared to lower density grid cases, data for this participant included ensemble frequency tuning that covered the majority of the (speech-related) acoustic spectrum (180–7,000 Hz), a factor which we found was critical for accurate reconstruction (Figure 4D). Spectrogram reconstructions were generated for each of 47 words, using neural responses either from single trials or averaged over 3–5 trials per word (same word set and cross-validated fitting procedure as described in Figure 2). To identify individual words from the reconstructions, a simple speech recognition algorithm based on dynamic time warping was used to temporally align words of variable duration [40]. For a target word, a similarity score (correlation coefficient) was then computed between the target reconstruction and the actual spectrograms of each of the 47 words in the candidate set. The 47 similarity scores were sorted and word identification rank was quantified as the percentile rank of the correct word. (1.0 indicates the target reconstruction matched the correct word out of all candidate words; 0.0 indicates the target was least similar to the correct word among all other candidates.) The expected mean of the distribution of identification ranks is 0.5 at chance level. Word identification using averaged trials was substantially higher than chance (Figure 8A and B, median identification rank = 0.89, p 5) was generally unavailable for a robust estimate, and uncorrected values are therefore reported. STRF Encoding Models Encoding models describe the linear mapping between the stimulus representation and the neural response at individual sites. For a stimulus representation s(x,t) and instantaneous neural response r(t) sampled at times t = 1 … T, the encoding model is defined as the linear mapping [14],[56]: (4) Each coefficient of h indicates the gain applied to stimulus feature x at time lag u. Positive values indicate components of the stimulus correlated with increased neural response, and negative values indicate components correlated with decreased response. The residual, e(t), represents components of the response (nonlinearities and noise) that cannot be predicted by the encoding model. Model fitting for the STRF models (h in Eqn. 4) proceeded similarly to reconstruction except a standard gradient descent algorithm (with early stopping regularization) was used that does not impose a sparse solution [16],[36],[49]. The linear STRF model included 32 frequency channels×100 time lags (3,200 parameters). The full nonlinear modulation STRF model included 32 frequency×5 scale×12 rate×100 time lags (192,000 parameters) and the reduced rate-time modulation model (Figure S4) included 6 rate×100 time lags (600 parameters). The STRF models were cross-validated using 20 resampled data sets with no overlap between training and test partitions within each resample. Data partitions were identical across STRF model type (linear and nonlinear). We did not enforce identical resampled data sets for estimating STRF and reconstruction models, because the predictive power of these two approaches is not comparable. Tuning curves were estimated from STRFs as follows: Frequency tuning was estimated from the linear STRF models by first setting all inhibitory weights to zero and then summing across the time dimension [53]. Nonlinear rate tuning was estimated from the nonlinear STRF modulation model by the same procedure, using the reduced rate-only representation. Linear rate tuning was estimated from the linear STRF model by filtering the fitted STRF with the modulation filterbank (see Speech Stimuli) and averaging along the irrelevant dimensions. Linear rate tuning computed in this way was similar to that computed from the modulation transfer function (modulus of the 2-D Fourier transform) of the fitted linear STRF [57]. For all tuning curves, standard error was computed as the standard deviation of the resampled estimates [17]. Frequency tuning curve peaks were identified as significant parameters (t>2.0) separated by more than a half octave. To calculate ensemble tuning curves (Figure 5B), the tuning curve for each site was normalized by the maximum value and averaged across sites. STG sites with forward prediction accuracy of r>0.1 were analyzed (n = 195). Supporting Information Figure S1 Anatomical distribution of surface local field potential (LFP) responses and linear STRFs in a low density grid participant (10 mm electrode spacing). (A) Trial averaged spectral LFP responses to English sentences (2–4 s duration) at individual electrode sites. Consistent with previous intracranial language studies [1]–[5], speech stimuli evoke increased high gamma power (∼70–150 Hz) sometimes accompanied by decreased power at lower frequencies ( 0.5 SD from baseline). (B) Example linear STRFs across all sites for one participant. All models are fit to power in the high gamma band range (70–150 Hz). (C) Anatomical location of subdural electrode grid (10 mm electrode spacing). Yellow outline indicates sites as in (A) and (B). (TIF) Click here for additional data file. Figure S2 Frequency tuning. (A) Left panels: linear STRFs for two example electrode sites. Right panels: pure tone frequency tuning (black curves) matches frequency tuning derived from fitted linear STRF models (red curves). For one participant, pure tones (375–6,000 Hz, logarithmically spaced) were presented for 100 ms at 80 dB. Pure tone tuning curves were calculated as the amplitudes of the induced high gamma response across tone frequencies. STRF-derived tuning curves were calculated by first setting all inhibitory weights to zero and then summing across the time dimension [6]. At these two sites, frequency tuning is approximately high-pass (top) or low-pass (bottom). (B) Distribution of the number of frequency tuning peaks across significant electrodes (N = 15 participants) estimated from linear STRF models (32-channel). The majority of sites exhibit complex frequency tuning patterns of 2–5 peaks. Peaks were identified as significant parameters (t>2.0) separated by more than a half octave. (TIF) Click here for additional data file. Figure S3 Mean reconstruction accuracy for the joint rate-scale space across all participants (N = 15). Top: modulation-based (nonlinear) decoding accuracy is significantly higher compared to frequency-based (linear) decoding (bottom) for all spectral scales at temporal rates ≥16 Hz (p<0.05, post hoc pair-wise comparisons, Bonferroni correction, following significant two-way repeated measures ANOVA; model type by stimulus component interaction effect, F(59,826) = 1.84, p<0.0005). (TIF) Click here for additional data file. Figure S4 Modulation rate tuning was estimated from both linear and nonlinear STRF models, based on the spectrogram or modulation representation, respectively. Linear STRFs have a 2-D parameter space (frequency×time). Modulation rate tuning for the linear STRF was computed by filtering the fitted STRF model with the modulation filterbank (see Materials and Methods) and averaging along the irrelevant dimensions. Modulation rate tuning computed in this way was similar to that computed from the modulation transfer function (MTF) (modulus of the 2-D Fourier transform of the fitted STRF [7]). Nonlinear STRFs have a 4-D parameter space (rate×scale×frequency×time). Modulation-based rate tuning curves were computed by summing across the three irrelevant dimensions [8]. Modulation rate tuning was similar whether this procedure was applied to a reduced dimension model (rate×time only) or to the marginalized full model. Reported estimates of modulation rate tuning were computed from the reduced (rate×time) models. (A) Left: example linear STRF. The linear STRF can be transformed into rate-scale space (the MTF, right) by taking the modulus of the 2-D Fourier transform [7] or by filtering the STRF with the modulation filter bank. The linear modulation rate tuning curve (blue curve, top) is obtained after averaging along the scale dimension. (B) Left: example nonlinear STRF from the same site as in (A), fit in the rate-time parameter space. Right: the corresponding modulation-based rate tuning curve (red) is plotted against the spectrogram-based tuning curve (blue) from (A) (only positive rates are shown). (TIF) Click here for additional data file. Figure S5 Confusion matrix for word identification (Figure 8). Left: pair-wise similarities (correlation coefficient) between actual auditory spectrograms of each word pair. Right: pair-wise similarities between reconstructed and actual spectrograms of each word pair. Correlations were computed prior to any spectrogram smoothing. (TIF) Click here for additional data file. Figure S6 Stimulus correlations in linear and nonlinear stimulus representations. Speech, like other natural sounds, has strong stimulus correlations (illustrated for acoustic frequency, top panels, and temporal modulation rate, bottom panels). Correlations were estimated from 1,000 randomly selected TIMIT sentences at different time lags (τ = 0, 50, 250 ms; note the temporal asymmetry due to the use of causal modulation filters). Under an efficient coding hypothesis [9], these statistical redundancies may be exploited by the brain during sensory processing. In this study, we used an optimal linear estimator (Wiener filter) [10], which is essentially a multivariate linear regression and does not account for correlations among the output variables. Stimulus reconstruction therefore reflects an upper bound on the stimulus features that are encoded by the neural ensemble [10]. As described in previous work [10],[11], the effect of stimulus statistics on reconstruction accuracy can be explored systematically using different stimulus priors. (TIF) Click here for additional data file. Figure S7 Audio playback of reconstructed speech. The audio file contains a sequence of six isolated words that were reconstructed from single trial neural activity. Single trial reconstructions are generally not intelligible. However, coarse features such as syllable structure may be discerned. In addition, up and down frequency sweeps (corresponding to faster temporal rates) are more evident in the modulation reconstructions compared to the spectrogram reconstructions. Perceptual similarities between original and reconstructed words can be more easily recognized after first listening to the original sound. In the audio file, each word is presented as a sequence of the original sound heard by the participant, followed by the spectrogram (linear) reconstruction, followed by the modulation (nonlinear) reconstruction. The figure shows the spectrograms of the original and reconstructed words. For audio playback, the spectrogram or modulation representations must be converted to an acoustic waveform, a transformation that requires both magnitude and phase information. Because the reconstructed representations are magnitude-only, the phase must be estimated. In general, this is known as the phase retrieval problem [8]. To recover the acoustic waveform from the spectrogram, we used an iterative projection algorithm to estimate the phase [8]. This step introduces additional acoustic artifacts that can distort the auditory features reconstructed directly from neural responses. Consequently, the audio file is an accurate but not perfect reflection of the reconstructed speech representation. A similar algorithm can be used to recover the spectrogram from the modulation representation [8]. For the purposes of this demo, we instead projected the spectrogram reconstruction into the (complex) modulation domain, extracted the phase, and then combined the extracted phase with the reconstructed magnitude of the modulation representation. With both phase and magnitude information, an invertible transformation can then be used to convert the (complex) modulation representation back to the spectrogram [8]. Finally, to aid perceptual inspection of the reconstructions, the sample rate of the audio file is slightly slower (14 kHz) than that presented to participants (16 kHz). (TIF) Click here for additional data file. Text S1 Supporting Information references. (PDF) Click here for additional data file. Audio File S1 Example audio of reconstructed speech. (WAV) Click here for additional data file.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Temporal coherence and attention in auditory scene analysis.

              Humans and other animals can attend to one of multiple sounds and follow it selectively over time. The neural underpinnings of this perceptual feat remain mysterious. Some studies have concluded that sounds are heard as separate streams when they activate well-separated populations of central auditory neurons, and that this process is largely pre-attentive. Here, we argue instead that stream formation depends primarily on temporal coherence between responses that encode various features of a sound source. Furthermore, we postulate that only when attention is directed towards a particular feature (e.g. pitch) do all other temporally coherent features of that source (e.g. timbre and location) become bound together as a stream that is segregated from the incoherent features of other sources. Copyright © 2010 Elsevier Ltd. All rights reserved.
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Neurosci
                Front Neurosci
                Front. Neurosci.
                Frontiers in Neuroscience
                Frontiers Media S.A.
                1662-4548
                1662-453X
                27 July 2016
                2016
                : 10
                : 349
                Affiliations
                [1] 1Neuropsychology Lab, Department of Psychology, University of Oldenburg Oldenburg, Germany
                [2] 2Cluster of Excellence “Hearing4all” Oldenburg, Germany
                [3] 3Department of Engineering, Institute of Biomedical Engineering, University of Oxford Oxford, UK
                [4] 4Research Center Neurosensory Science, University of Oldenburg Oldenburg, Germany
                Author notes

                Edited by: Sonja A. Kotz, Maastricht University, Netherlands; Max-Planck Institute for Human Cognitive and Brain Sciences, Germany

                Reviewed by: Clément François, University of Barcelona, Spain; Dan Zhang, Tsinghua University, China

                *Correspondence: Bojana Mirkovic bojana.mirkovic@ 123456uni-oldenburg.de

                This article was submitted to Auditory Cognitive Neuroscience, a section of the journal Frontiers in Neuroscience

                Article
                10.3389/fnins.2016.00349
                4961688
                27512364
                12f0632b-2040-4a84-b017-1bbd0be988ef
                Copyright © 2016 Mirkovic, Bleichner, De Vos and Debener.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 22 April 2016
                : 12 July 2016
                Page count
                Figures: 7, Tables: 0, Equations: 3, References: 42, Pages: 11, Words: 8613
                Categories
                Psychology
                Original Research

                Neurosciences
                eeg,ceegrid,around-the-ear eeg,mobile eeg,selective attention,speech decoding,cocktail party,attended speaker

                Comments

                Comment on this article