5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Transformation of a temporal speech cue to a spatial neural code in human auditory cortex

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          In speech, listeners extract continuously-varying spectrotemporal cues from the acoustic signal to perceive discrete phonetic categories. Spectral cues are spatially encoded in the amplitude of responses in phonetically-tuned neural populations in auditory cortex. It remains unknown whether similar neurophysiological mechanisms encode temporal cues like voice-onset time (VOT), which distinguishes sounds like / b/ and/ p/. We used direct brain recordings in humans to investigate the neural encoding of temporal speech cues with a VOT continuum from / ba/ to / pa/. We found that distinct neural populations respond preferentially to VOTs from one phonetic category, and are also sensitive to sub-phonetic VOT differences within a population’s preferred category. In a simple neural network model, simulated populations tuned to detect either temporal gaps or coincidences between spectral cues captured encoding patterns observed in real neural data. These results demonstrate that a spatial/amplitude neural code underlies the cortical representation of both spectral and temporal speech cues.

          Related collections

          Most cited references112

          • Record: found
          • Abstract: found
          • Article: not found

          The origin of extracellular fields and currents--EEG, ECoG, LFP and spikes.

          Neuronal activity in the brain gives rise to transmembrane currents that can be measured in the extracellular medium. Although the major contributor of the extracellular signal is the synaptic transmembrane current, other sources--including Na(+) and Ca(2+) spikes, ionic fluxes through voltage- and ligand-gated channels, and intrinsic membrane oscillations--can substantially shape the extracellular field. High-density recordings of field activity in animals and subdural grid recordings in humans, combined with recently developed data processing tools and computational modelling, can provide insight into the cooperative behaviour of neurons, their average synaptic input and their spiking output, and can increase our understanding of how these processes contribute to the extracellular signal.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Cortical oscillations and speech processing: emerging computational principles and operations.

            Neuronal oscillations are ubiquitous in the brain and may contribute to cognition in several ways: for example, by segregating information and organizing spike timing. Recent data show that delta, theta and gamma oscillations are specifically engaged by the multi-timescale, quasi-rhythmic properties of speech and can track its dynamics. We argue that they are foundational in speech and language processing, 'packaging' incoming information into units of the appropriate temporal granularity. Such stimulus-brain alignment arguably results from auditory and motor tuning throughout the evolution of speech and language and constitutes a natural model system allowing auditory research to make a unique contribution to the issue of how neural oscillatory activity affects human cognition.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Different Origins of Gamma Rhythm and High-Gamma Activity in Macaque Visual Cortex

              Introduction Neuronal oscillations exist in the brain over a wide range of frequencies, including the delta (1–3 Hz), theta (4–8 Hz), alpha (9–12 Hz), beta (12–30 Hz), and gamma (30–80 Hz) bands, and are thought to reflect cortical processing [1]. In addition to the modulation of power in some of the aforementioned frequency bands, there is often an increase in power in a broad frequency range above 80 Hz, called the “high-gamma” band (80–200 Hz). This increase in high-gamma power has been most consistently observed in electrocorticogram (ECoG) studies in humans [2]–[5] but is also observed in local field potentials (LFPs; [6]–[8]) and magnetoencephalogram (MEG; [9],[10]). It has been seen in several cortical areas, under diverse stimulus conditions and a range of cognitive states (for a review see [11]). The functional significance of high-gamma activity and its relationship to gamma rhythms that are typically observed at lower frequencies (30–80 Hz) remain unclear. One suggested role of the gamma band is to provide communication channels between cortical areas [12],[13]. Within this framework, there could be multiple frequency bands for communication [14], so the high-gamma band could serve as a distinct channel [15], possibly nested within a low frequency rhythm [3],[5],[16]. On the other hand, several studies have shown that spiking activity is coupled to power in the high-gamma range [6]–[8],[17]–[19]. Because under many conditions the gamma power and firing rates are correlated (for example, during attentional modulation), it is difficult to distinguish between the two possibilities described above. We addressed this issue by studying the LFP power spectrum in V1 of monkeys while manipulating the stimulus size, because increasing the size decreases the firing rate but increases the strength of the gamma rhythm (i.e., the two are anti-correlated; [20]), permitting a dissociation. Using a signal processing technique called Matching Pursuit (MP) that imposes minimal a priori assumptions on LFP decomposition and can simultaneously resolve both transient and oscillatory components in the LFP [8],[18], we studied the relationship between spiking activity and high-gamma power under conditions when the gamma rhythm was absent (no stimulus), weak (small stimulus size), or strong (large size). We found that high-gamma activity was strongly correlated with the multiunit spiking, under different manipulations of stimulus size and temporal frequency. Our results show that multiunit activity can be reliably estimated from the high-gamma power. Further, while investigating the role of high-gamma band in communication or coding, it is important to account for population spiking activity because it may also modulate power in the high-gamma band. Results Recordings were made from an array of 96 electrodes (Blackrock Systems) that was chronically implanted in the right hemisphere of V1 in two monkeys. The receptive fields were in the lower left visual quadrant at an eccentricity of 3–5°. The monkeys performed an orientation change detection task (Figure S1A), where they attended to a Gabor stimulus outside the receptive field while a series of gratings of six different sizes and orientations were presented inside the receptive field of one of the recording sites (new location for each session) for 400 ms with an interstimulus interval of 600 ms (see Materials and Methods for further details). Analysis was restricted to sites whose receptive field centers were within 0.2° from the stimulus center and for which the firing rate was at least 1 spike/s for each of the six sizes. This yielded 15 and 104 sites from Monkeys 1 and 2. Unless stated otherwise, the results shown below were obtained after pooling the data across orientations to increase the statistical power, although similar results were obtained when the analysis was performed only on the preferred orientation. Four analyses were performed. First, we studied the correlation between firing rates and LFP power (as a function of frequency) while varying the stimulus size. For this analysis, firing rates and LFP power were averaged over time (between 200 and 400 ms after stimulus onset) as well as over stimulus repetitions. Second, we computed the correlation between two time-series: firing rates and the average LFP power in different frequency bands, both computed in 2 ms bins and averaged over stimulus repetitions. Third, we computed the trial-by-trial co-variability in firing rates and LFP power in different frequency bands under identical stimulus conditions. Finally, we performed a spike-triggered analysis in two dimensions (time and frequency) to estimate the temporal and spectral components in the LFP that were locked to spikes. Spectral analyses were performed using the MP algorithm (see Materials and Methods for details) and were compared with the more traditional multitaper method [21],[22] in the Supporting Information section. Correlation between Firing Rates and LFP Power as a Function of Frequency Figure 1A shows the average multiunit firing rate of a typical recording site from Monkey 1 when gratings of six different radii (values shown in the inset of Figure 1C) were presented between 0 and 400 ms. The inset shows the firing rate between 200 and 400 ms (thick horizontal black line on the time axis), as a function of stimulus size. As expected, increasing the stimulus size increased the strength of the inhibitory surround, which decreased the firing rate. Figure 1B shows the change in LFP power relative to a baseline period (defined as 0 to 300 ms before stimulus onset) for three different sizes (radii of 0.3°, 1.14°, and 2.4°, shown in Figure S1B). These time-frequency energy difference spectra showed a large broadband increase in power in the first 100 ms after stimulus onset, coinciding with the transient increase in firing rate (Figure 1A). The gamma rhythm, represented by a horizontal band at ∼50 Hz in the time-frequency spectrum, appeared after the initial transient and continued until the stimulus was turned off at 400 ms. Consistent with the results shown in [20], gamma rhythm amplitude increased with increasing stimulus size. We also observed an increase in power over a broad frequency range above the gamma range (>60 Hz). However, power in this band showed the opposite trend—it decreased with increasing stimulus size, similar to the decrease observed in the firing rates. Figure 1C shows the energy between 200 and 400 ms (indicated by thick black lines on the time axes of Figure 1B), as a function of frequency, for the six stimulus sizes (colored traces) as well as the pre-stimulus baseline (black trace). While the power in the gamma range (40–60 Hz, peak at ∼50 Hz) increased with size, beyond the gamma range there was a clear decrease in power with increasing size. Figure 1D–F and G–I show the population average of LFP recordings from all the sites in Monkeys 1 and 2 (15 and 104, respectively). The firing rates were normalized by dividing by the maximum firing rate for each site before averaging (Figure 1D and G). The time-frequency power difference spectra (Figure 1E, H) and the power versus frequency spectra (Figure 1F, I) were averaged across sites on a log scale (see Materials and Methods for details). Note that Monkey 2 showed a second gamma rhythm at ∼90 Hz (also observed by [20]), and therefore the relative decrease in LFP power with increasing stimulus size could be observed only above ∼100 Hz. 10.1371/journal.pbio.1000610.g001 Figure 1 Dissociation of the gamma rhythm and high-gamma activity by manipulating stimulus size. (A) Average multiunit recorded from a single site in Monkey 1 during the presentation of a static grating (0 to 400 ms) at six different sizes, shown in different colors. The inset shows the average firing rate between 200 and 400 ms, indicated by a thick black line on the abscissa. (B) Time-frequency energy difference plots (in dB) showing the difference in energy from baseline energy (−300 to 0 ms, 0 denotes the stimulus onset, difference computed separately for each frequency) for the smallest (radius of 0.3°, left panel), medium (1.14°, middle), and largest (2.4°, right) sizes. The gamma rhythm at ∼50 Hz increases with size, while the high-gamma activity above the gamma band decreases with size. (C) The LFP energy between 200 and 400 ms (denoted by a thick black line on the abscissa in B) as a function of frequency for the six sizes, whose radii are listed in the legend. The black line shows the LFP energy in the baseline period. (D–F) and (G–I) show corresponding population responses of 15 and 104 sites from Monkeys 1 and 2, respectively. For (D) and (G), the responses are normalized by dividing by the maximum firing rate for each site. Monkey 2 showed two distinct gamma bands at ∼50 and ∼90 Hz. Although gamma rhythm and high-gamma activity are usually distinguished solely on the basis of frequency (30–80 Hz versus 80–200 Hz), it is critical to note that these two phenomena have very different spectral profiles and there could be considerable overlap between the frequency ranges. The gamma rhythm is “band-limited,” with a bandwidth of ∼20 Hz, and is visible in the power spectrum as a distinct “bump.” Typically the center frequency of gamma rhythm is between 30 and 80 Hz, but occasionally there is a second peak at higher frequencies also (Monkey 2). In contrast, high-gamma activity is “broadband,” represented by an elevation in power over a broad frequency range without any obvious bumps. Although high-gamma activity is more prominent at frequencies above ∼80 Hz, it can be observed at frequencies as low as ∼50 Hz when gamma rhythm is absent (see the “spike-triggered average analysis” section below). Figure 2A shows the mean change in power (200–400 ms after stimulus onset) from baseline, as a function of frequency (obtained by subtracting the black trace from the colored traces in Figure 1F and 1I). To relate the changes in power with stimulus size with corresponding changes in firing rates, we computed the Spearman rank correlation, for each site and at each frequency, between the six power values and firing rate values (one value for each stimulus size, all values computed between 200 and 400 ms after stimulus onset). Figure 2B shows the mean (solid black line) and SEM (gray line) of the Spearman rank correlation of 15 and 104 sites in Monkeys 1 and 2, as a function of frequency. Correlation values significantly different from zero are shown in green (p 100 Hz) followed the same pattern as the firing rates, with a Spearman correlation of more than 0.85 (shown at the upper-left corner of each plot). Similar trends were observed for a temporal frequency of 5 Hz (Figure 5B and 5D) as well as 10 and 20 Hz for Monkey 2 (Figure S4). Beyond 20 Hz, the firing rates did not follow the temporal frequency and the correlation estimates were noisier. These results agree well with our earlier observation that LFP power above ∼100 Hz closely tracks the changes in firing rates. Note that at temporal frequencies of 5 Hz and above, alpha and gamma bands contained harmonics of the stimulus frequencies, which made their estimation of power inaccurate. 10.1371/journal.pbio.1000610.g005 Figure 5 Correlations between firing rate and LFP power in different frequency bands for stimuli with different temporal frequency profiles. (A) The left panel shows the average time-frequency energy difference spectrum of 19 sites in Monkey 1 when the stimulus was presented with a counter-phasing temporal frequency of 2.5 Hz. The contrast profile is shown in red on top of the right panels. The right panels show the relative change in power in different frequency bands as well as in the firing rates, as a function of time. Same format as in Figure 4. The Spearman correlation values between the firing rate and power traces are shown in the top left corner. (B) Same as panel (A), for a temporal frequency of 5 Hz. (C, D) Same as (A, B) for 66 sites in Monkey 2. Trial-by-Trial Co-variations in Firing Rates and LFP Power In the previous analyses we studied the relationship between LFP power and firing rates under different stimulus conditions (different sizes). Under these circumstances, it is difficult to determine whether the changes in firing rates and LFP power are due to the same biological mechanism, because changing the stimulus may lead to several changes in the neuronal network. A partial way to address this concern is to study the trial-by-trial covariation in firing rates and LFP power in different frequency bands when the stimulus conditions are identical across trials. For this analysis, we first computed the firing rates and LFP power between 200 and 400 ms for each stimulus presentation. LFP power was computed in a 25 Hz band, in steps of 10 Hz. The Spearman rank correlation between firing rate and LFP power at each frequency was computed individually for each site, orientation, and size. Similar analysis was also performed before stimulus onset (−300 to 0 ms). Figure 6A and 6C show the median Spearman rank correlation, averaged across days and orientations, for Monkeys 1 and 2. The first column shows the correlation during the baseline period (indicated by a black horizontal line below the x-axis; the correlation values were averaged across sizes). The other columns show the correlation during the stimulus period (each column represents a different size, indicated by a colored line below the x-axis). Correlation became stronger with increasing frequency during baseline as well as stimulus period. The smallest stimulus (second column) that produced the highest firing rate had the highest correlation, which is expected because correlations are difficult to detect when firing rates are low. 10.1371/journal.pbio.1000610.g006 Figure 6 Trial-by-trial Spearman correlation between firing rates and LFP power at different frequencies when stimulus conditions are identical. (A) The median Spearman rank correlation between LFP power at different frequency bins (size of 25 Hz, computed in steps of 10 Hz) and firing rates, both computed between 200 and 400 ms after stimulus onset, for 15 sites in Monkey 1. The correlations were computed separately for each size, site, and orientation, so that the stimulus conditions were identical. The first column shows the median correlations during the pre-stimulus period (denoted “BL” for baseline). The remaining six columns represent the six stimulus sizes (denoted by the respective color below the x-axis). (B) Median Spearman correlation, computed for the four frequency bands used in Figures 4 and 5. Correlations significantly different from zero (p 200 Hz) in the rat barrel cortex [26]–[28]. Indeed, the oscillation frequency of a network critically depends on excitation-inhibition balance [23],[29],[30] and could exceed 100 Hz [29]. Therefore, 100 Hz should not be thought of as a “strict boundary” separating oscillatory and broadband activity. Could the broadband high-gamma power reflect many narrowband oscillators operating at many different center frequencies? While this possibility is difficult to rule out completely, several problems arise with this hypothesis. First, the MP algorithm has different types of functions to represent oscillatory and transient signals; the type of function chosen depends on the properties of the signal itself. We found that most of the energy in the high-gamma range is captured by transient functions, which further are tightly coupled to the occurrence of spikes (Figures 7 and 8), rather than extended oscillations. Second, the time-frequency uncertainty principle limits the number of different frequency bands that can be used over a given period (for example, if computation takes place over 100 ms, the center frequencies must be more than 10 Hz apart). In addition, elaborate filtering mechanisms would be required for such coding schemes. Finally, because the LFP power follows a “1/f” spectrum (Figure 1C, F, and I), the raw power at high-gamma frequencies is extremely small, typically less than 1% of the total signal energy. So if the broadband response is due to multiple oscillators at different frequencies, their power will be too weak to support reliable communication channels. Relationship between Spikes and High-Gamma Power Our results are consistent with several studies that have reported a correlation between spiking activity and LFP power at gamma and high-gamma frequencies [6]–[8],[17]–[19]. In addition, recent studies have revealed that low-frequency ( 10 Hz), including the gamma range, in the first 100 ms after stimulus onset, before the onset of the gamma rhythm at ∼50 Hz. This broadband power might reflect synaptic activity [47]–[49], spike afterpotentials [50], or the spiking activity at higher frequencies (as shown in Figure 7, also discussed in the next section). Thus, it is crucial to dissociate between the band-limited gamma rhythm and the broadband increase in power (which includes the gamma band) due to synaptic and spiking events when assessing a functional role of gamma rhythms in cognitive processing. Nested Cross-Frequency Coupling Several studies have shown that oscillations in different frequency bands of the LFP may be correlated with each other (for example, high-frequency power could be correlated with the phase at a lower frequency) and have hypothesized that this coupling could facilitate cortical processing simultaneously at several distinct timescales [3],[16],[51]–[53]. A recent study even shows complex phase-amplitude cross-frequency interactions in the absence of oscillatory peaks in the signal [5]. Our results are not inconsistent with these hypotheses, especially when one of the rhythms is at a lower frequency such as in the delta or theta range. However, at higher frequencies, such as gamma or high-gamma bands, it is important to distinguish between a “nested gamma rhythm” and possible contributions from phase-locked spikes. An important issue here is the way the LFP signal is decomposed for time-frequency analysis. Most analysis techniques (Short Time Fourier Transform, multitaper analysis, etc.) necessarily decompose the LFP into a series of narrow band signals at various frequencies. In MP analysis, we start with an over-complete dictionary of functions that include both oscillatory (narrow-band) as well as transient (broadband) functions and find those that best represent the signal. We find that the LFP has several “broadband” components, such as the transients observed in the first ∼100 ms after stimulus onset or the sharp negativity associated with spikes, which are best described by either delta functions or a Gaussian with a small sigma. However, if such components are decomposed using traditional methods, we obtain a series of oscillatory functions whose amplitude and phase values are correlated (for example, the Fourier Transform of a delta function gives constant amplitude and zero phase at all frequencies). In other words, broadband components associated with spiking, stimulus onset, or eye movements [54], if decomposed into a series of oscillatory components, can lead to spurious correlations between those components. Population Dynamics at Fine Spatial Scales Recent studies have argued that LFP has a spatial spread of ∼250 µm in cortex [55],[56]. Coupled with our results, this suggests that high-gamma activity is a sensitive measure of population firing rate of a small region near a microelectrode. Further, Figure 8 shows that changes in correlation in the neural population (in this case, the degree of gamma phase-locking) could also be reflected in the high-gamma range. The dependence of high-gamma power on the degree of synchronization/correlation in the neural population is expected to increase with the size of the neural population [8],[49]. Several cognitive mechanisms, such as selective attention, change the degree of correlation in the neural population [57],[58]; high-gamma activity potentially could be used to study these network dynamics at a fine spatial scale. Materials and Methods Two separate datasets were used in this article. The first set was used to study the effect of size (the “size study,” all figures except Figure 5) on LFP power. The second set was used to study the effect of temporal frequency (the “temporal frequency study,” Figure 5). The behavioral task (described below) was the same for both datasets. Behavioral Task and Recording The animal protocols used in this study were approved by the Institutional Animal Care and Use Committee of Harvard Medical School. Recordings were made from two male rhesus monkeys (Macaca mulatta, 11 and 14 kg). Before training, a scleral search coil and a head post were implanted under general anesthesia. After monkeys learned the behavioral task (∼4 mo), we implanted a 10×10 array of microelectrodes (Blackrock Microsystems, 96 active electrodes) in the right V1 (about 15 mm anterior to the occipital ridge and 15 mm lateral to the midline). The microelectrodes were 1 mm long and 400 µm apart, with impedance between 0.3 and 1 MΩ at 1 kHz. The entire length of the microelectrodes was inserted into cortex; we expect them to be in layer 2/3 or 4. Histology has not been performed. The receptive fields of the neurons recorded from the microelectrodes were in the lower left quadrant of the visual field at an eccentricity of about 3–5°. Each monkey was trained to do an orientation-change detection task (Figure S1A). The monkey was required to hold its gaze within 1° of a small central dot (0.05–0.10° diameter) located at the center of a CRT video display (100 Hz refresh rate, 1,280×768 pixels, gamma corrected), while two achromatic odd-symmetric stimuli were synchronously flashed for 400 ms with an interstimulus period of 600 ms. For the size study, the stimulus in the left hemifield was a grating of variable size centered on the receptive field of one of the recording sites (new location for each session); the second stimulus was a Gabor stimulus with an SD of 0.5° located at an equal eccentricity on the opposite side of the fixation point. The monkey was cued to attend to the Gabor stimulus outside the receptive field, whose contrast was fixed at a low value to make the task demanding. Stimulus features (size and orientation) at the unattended location inside the receptive field were varied for each stimulus presentation in a pseudo-random order. At an unsignaled time drawn from an exponential distribution (mean 3,000 ms, range 0 to 7,000 ms for Monkey 1; 1,000 to 7,000 ms for Monkey 2), the orientation of the stimulus at the cued location changed by 90°. The monkey was rewarded with a drop of juice for making a saccade to the location of the changed stimulus within 500 ms of the orientation change. To account for saccade latency and to avoid rewarding the monkey for guessing, the monkey was rewarded only for saccades beginning at least 70 ms after the orientation change. Trials were truncated at 8,000 ms if the target had not appeared (∼5% of trials), in which case the animal was rewarded for maintaining fixation up to that time. For the size study, the gratings were static with a spatial frequency of 4 cycles/degree (CPD), ∼100% contrast, located at the center of the receptive field of one of the sites (different recording site each session), at one of six different orientations (0°, 30°, 60°, 90°, 120°, and 150°) and six different radii (0.3°, 0.72°, 1.14°, 1.56°, 1.98°, and 2.4°), chosen pseudo-randomly. The Gabor stimulus outside the receptive field was also static, with a spatial frequency of 4 CPD, a fixed orientation (typically the preferred orientation of the recorded site) and size (SD: 0.5°), and an average contrast of ∼6% and ∼4.3% for Monkeys 1 and 2. The two monkeys performed the task in 10 and 24 recording sessions. For the temporal frequency study (Figure 5), we used a counter-phasing Gabor stimulus inside the receptive field, with a spatial frequency of 4 CPD, preferred orientation, ∼100% contrast, SD of 0.8° and 1° for Monkeys 1 and 2, at five temporal frequencies—0, 0.62, 1.25, 2.5, and 5 Hz—for Monkey 1, and nine frequencies—0, 0.62, 1.25, 2.5, 5, 10, 20, 40, and 50 Hz—for Monkey 2. The Gabor stimulus outside the receptive field was static, with a spatial frequency of 4 CPD, preferred orientation, SD of 0.5°, and an average contrast of ∼3% and ∼7% for Monkeys 1 and 2. The two monkeys performed the task in 7 and 16 recording sessions. Only correct trials were used for analysis. Catch trials (trials in which the orientation did not change) were excluded. For each correct trial, only the second stimulus through to the last stimulus before the target were used for analysis, so that the stimulus conditions were identical for the entire dataset. The first stimulus in each correct trial, which typically produced a stronger response, was analyzed separately, and similar results were obtained. For the size study, the average number of repetitions for each combination of size and orientation was 19 (range 6 to 36) for Monkey 1 and 15 (range 7 to 28) for Monkey 2. For the temporal frequency study (Figure 5), the average number of repetitions per temporal frequency was 82 (range 31 to 169) and 14 (range 6 to 40) for Monkeys 1 and 2. Local field potential (LFP) and multiunits were extracted using commercial hardware and software (Blackrock System). Raw data were filtered between 0.3 Hz (Butterworth filter, 1st order, analog) and 500 Hz (Butterworth filter, 4th order, digital) to extract the LFP, and digitized at 2 kHz (16 bit resolution). Multiunits were extracted by filtering the raw signal between 250 Hz (Butterworth filter, 4th order, digital) and 7,500 Hz (Butterworth filter, 3rd order, analog) followed by an amplitude threshold. Receptive Field Mapping and Electrode Selection Receptive fields were estimated by flashing small Gabor stimuli (SD of 0.05–0.1°) on a 9×9 (Monkey 1) or 11×11 (Monkey 2) rectangular grid that spanned the receptive fields of all the electrodes, while the monkeys attended to a Gabor stimulus outside the receptive field. The evoked LFP responses and the multiunit responses at different stimulus locations were fitted separately with a 2-D Gaussian to estimate the receptive field centers and sizes. Receptive fields obtained from multiunit and LFP responses were very similar. As the multiunit activity was more variable across days (and sometimes absent), we used the receptive field estimates from evoked LFP responses for analysis. For Monkey 1, the upper half of the grid did not yield any responses at all. Stable estimates of the receptive field centers (SD less than 0.1° across days) were obtained from 27 electrodes in Monkey 1 and 66 electrodes in Monkey 2. The remaining electrodes yielded weak and inconsistent evoked responses and were excluded from analysis. For each recording session only the electrodes with receptive field centers within 0.2° of the stimulus center were used for analysis. For the size study, this yielded 56 electrodes (24 unique electrodes—many electrodes were recorded on multiple sessions) for Monkey 1 and 138 electrodes (66 unique) for Monkey 2. Out of these, we selected electrodes for which the average firing rate between 200 and 400 ms (the period over which analysis was done, see below) was at least 1 spike/s for all stimulus sizes, and the signal-to-noise ratio of the isolation was greater than 1.5. This yielded 15 (11 unique) and 104 (58 unique) “spike” electrodes for Monkeys 1 and 2, respectively. For the temporal frequency study, 44 (22 unique) and 90 (59 unique) electrodes had receptive fields within 0.2 degrees of the stimulus center. Out of these, we selected electrodes for which the average firing rate between 200 and 400 ms was at least 1 spike/s at zero temporal frequency, and the signal-to-noise ratio of the isolation was greater than 1.5. This yielded 19 (13 unique) and 66 (42 unique) spike electrodes for Monkeys 1 and 2, respectively. To account for the multiplicity of some electrodes in our dataset, all analyses were repeated after pooling the data from the same electrode across days. Similar results were obtained. Data Analysis Time-frequency analysis Time-frequency decomposition was performed using the MP algorithm [59]. MP is an iterative procedure to decompose a signal as a linear combination of members of a specified family of functions gγn, which are usually chosen to be sine-modulated Gaussians—that is, Gabor functions or “Gabor atoms”—because they give the best compromise between frequency and time resolution. In this algorithm, a large overcomplete dictionary of Gabor atoms is first created. In the first iteration the atom gγ0 that best describes the signal f(t) (i.e., has the largest inner product with it) is chosen from the dictionary and its projection onto the signal is subtracted from it. The procedure is repeated iteratively with the residual replacing the signal. Thus, during each of the subsequent iterations, the waveform gγn is matched to the signal residue Rnf, which is the residue left after subtracting the results of previous iterations. Mathematical details of this method are presented elsewhere [18]. Time-frequency plots were obtained by calculating the Wigner distribution of individual atoms and taking the weighted sum [59]. In MP, by choosing a large dictionary of Gabor atoms, we get fewer a priori limitations on decomposition and more free parameters than other methods and are able to detect local patterns in the signal with the best possible compromise between time and frequency resolution. Because the overcomplete dictionary has basis functions with a wide range of time and frequency support, we can represent rhythms (alpha, gamma, etc.) with functions that are extended in time but narrow in frequency, as well as transients (due to spiking or stimulus onset) with functions that are brief in time but broad in frequency. The availability of “broadband” basis functions that can capture the sharp transients associated with spikes (Figures 7 and 8) is critical for the results shown in this article. Further details about this method and its advantages over traditional methods such as Short Time Fourier Transform are discussed elsewhere [8],[18]. In particular, we discuss why this method is much better suited to study the high-gamma activity in Supplementary Discussion 3 of [8]. We have made the software used for MP computation available online at http://erl.neuro.jhmi.edu/mpsoft. MP was performed on signals of length 4,096 (−1,148 ms to 900 ms at 0.5 ms resolution, where zero denotes the time of stimulus onset), yielding a 4,096×4,096 array of time-frequency energy values (with a time resolution of 0.5 ms and frequency resolution of 2,000/4,096 Hz = ∼0.5 Hz). Power versus frequency plots (Figure 1C, 1F, 1I) were generated by averaging the energy within a time period at a given frequency. (1) where E(t,ω) is the mean energy averaged over trials at time t and frequency ω obtained from the MP algorithm. When showing population data (Figure 1F, 1I), we averaged the log10(P(ω)) values of individual sites. The power was shown either between 200 and 400 ms (t0 = 200, T = 200) or during baseline (t0 = −300, T = 300). Time-frequency difference plots (Figures 1B, 1E, 1H, 4A, 4C, 5A, 5C) were obtained using the following equation: (2) where B(ω) is the baseline energy as defined in equation 1 with t0 = −300 ms, T = 300 ms. For the population data, we averaged the D(t,ω) values of individual sites. Multitapering analysis [60] was performed with three tapers, implemented in Chronux 2.0 [61], an open-source, data analysis toolbox available at http://chronux.org. Spectrum and spectrogram were computed using functions mtspectrumc and mtspecgramc in Chronux, respectively. Essentially, the multitaper method reduces the variance of spectral estimates by pre-multiplying the data with several orthogonal tapers known as Slepian functions. Details and properties of this method can be found here [21],[22]. Cross-correlation analysis (Figures 4 and 5) We adopted the method used by Womelsdorf and colleagues [12] based on Spearman-rank correlation to compute the cross-correlation between firing rates and power in different frequency bands. As a measure of the cross-correlation at time lag L, we computed the Spearman-rank correlation between the power between −100 and 500 ms and the firing rates from L−100 to L+500 ms (both quantities were computed with a time resolution of 2 ms; thus, we obtained 300 data pairs). This method is approximate because the power and rate values are not independent across time. However, the Spearman rank correlation analysis avoids assumptions about the underlying distributions [12]. Note that correlation does not change if values are scaled by a constant. For example, scaling down the green traces shown in Figure 4B and 4D (for the gamma band) would appear to improve their alignment with the black trace, however it will not change the Spearman correlation. We obtained cross-correlation functions for time lags (L) between −20 and 20 ms. Because the results did not vary greatly as a function of L and were maximum near L = 0, we report only the values for L = 0 in the main text. Behavior and eye positions The behavioral task was demanding and required sustained attention on the stimulus. Monkey 1 was correct in 78% of the completed trials (5% missed, 17% false alarms) for the size study and 78% (6% missed, 16% false alarms) for the temporal frequency study. Monkey 2 was correct in 93% of the completed trials (4% missed, 3% false alarms) for the size study and 90% (6% missed, 4% false alarms) for the temporal frequency study. Average eye positions, monitored at 200 Hz using a scleral search coil, differed by less than 0.03° across conditions, for both the size and temporal frequency studies, for both monkeys. Supporting Information Figure S1 Task and stimuli. (A) Task design. Monkeys were trained to an orientation-change detection task. The monkey was required to hold its gaze within 1° of a small central dot (white central dot), while two achromatic odd-symmetric stimuli were synchronously flashed for 400 ms with an interstimulus period of 600 ms. One was a grating of different sizes and orientations, centered on the receptive field of one of the recording sites (red circle; receptive fields of all the electrodes were in the lower left quadrant at an eccentricity of 3–5°); the second stimulus was a Gabor with a fixed size and orientation located at an equal eccentricity in the other hemifield. The monkey was cued to attend to the Gabor stimulus outside the receptive field. At an unsignaled time drawn from an exponential distribution, the orientation of this stimulus changed by 90°. The monkey was rewarded with a drop of juice for making a saccade to this stimulus within 500 ms of the orientation change. (B) The three gratings whose time-frequency plots are shown in Figure 1B, along with the mean receptive field size of the sites (red ellipse). (0.72 MB TIF) Click here for additional data file. Figure S2 Same analysis as Figure 2, when the spectra in (A) are computed using the multitaper method (with three tapers). The signal is taken between 200 and 400 ms with no zero padding, which yields a frequency resolution of 5 Hz. (0.24 MB TIF) Click here for additional data file. Figure S3 Same analysis as shown in Figure 4, when the time-frequency power spectra were computed using the multitaper method. The windows were 128 ms (A and C) or 64 ms (B and D) ms long and were shifted by 2 ms. (2.21 MB TIF) Click here for additional data file. Figure S4 Correlations between firing rate and LFP power in different frequency bands for stimuli presented at high temporal frequencies. (A) Average time-frequency energy difference plots (left panel) and changes in LFP power as well as firing rates from baseline (right panels), for a stimulus frequency of 10 Hz (contrast profile shown in red on top of the right panels), for 66 sites in Monkey 2. Same format as in Figure 5. (B) Same as (A) but for a temporal frequency of 20 Hz. (1.63 MB TIF) Click here for additional data file. Figure S5 Same analysis as in Figure 6, done using the multitaper method with three tapers. (0.64 MB TIF) Click here for additional data file. Figure S6 Spike-field coherence (SFC), computed between 150 and 406 ms after stimulus onset, for the six stimulus sizes. (A) Average SFC when spikes and LFP were taken from the same electrode, for 15 pairs in Monkey 1. (B) Average SFC of 85 spike-LFP pairs in Monkey 1, taken from separate electrodes. Both electrodes were within 0.2° of the stimulus center. (C–D) Same as (A–B), but for 104 and 563 spike-LFP pairs for Monkey 2. (0.25 MB TIF) Click here for additional data file. Figure S7 STTFA analysis using the multitaper method. (A) The left plot shows the nSTTFA for Monkey 1 during baseline period (similar to the right column in Figure 7B), when the time-frequency power spectrum is computed using multitaper method (window length = 64 ms, window shift = 0.5 ms). The right plot shows the nSTTFA computed from spikes between 200 and 400 ms when the largest stimulus was presented (similar to the right column in Figure 8B). (B) Same analysis as (A), with a window of 32 ms. (C,D) Same as (A,B), for Monkey 2. (1.66 MB TIF) Click here for additional data file. Figure S8 Evoked LFP response, computed by averaging the LFP traces locked to the stimulus onset. The black horizontal line represents the stimulus period. The low magnitude high-frequency oscillations observed in some of the traces are due to the refresh rate of the monitor at 100 Hz. (0.25 MB TIF) Click here for additional data file.
                Bookmark

                Author and article information

                Contributors
                Role: Reviewing Editor
                Role: Senior Editor
                Journal
                eLife
                Elife
                eLife
                eLife
                eLife Sciences Publications, Ltd
                2050-084X
                25 August 2020
                2020
                : 9
                : e53051
                Affiliations
                [1 ]Department of Neurological Surgery, University of California, San Francisco San FranciscoUnited States
                [2 ]Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University NijmegenNetherlands
                [3 ]Max Planck Institute for Psycholinguistics NijmegenNetherlands
                [4 ]Weill Institute for Neurosciences, University of California, San Francisco San FranciscoUnited States
                Washington University in St. Louis United States
                Carnegie Mellon University United States
                Washington University in St. Louis United States
                University of Oregon United States
                Author information
                https://orcid.org/0000-0003-0298-3664
                https://orcid.org/0000-0002-8530-880X
                https://orcid.org/0000-0003-2480-4700
                Article
                53051
                10.7554/eLife.53051
                7556862
                32840483
                ada6cf86-3b3a-4e71-b730-8227bcbb9446
                © 2020, Fox et al

                This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

                History
                : 25 October 2019
                : 21 August 2020
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: R01-DC012379
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: F32-DC015966
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100000780, European Commission;
                Award ID: FP7-623072
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100003194, New York Stem Cell Foundation;
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100010246, William K. Bowes, Jr. Foundation;
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000011, Howard Hughes Medical Institute;
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100010319, Shurl and Kay Curci Foundation;
                Award Recipient :
                The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
                Categories
                Research Article
                Neuroscience
                Custom metadata
                The human brain has a spatial code for representing temporal phonetic distinctions in speech.

                Life sciences
                speech,electrocorticography,auditory cortex,temporal processing,categorical perception,voice-onset time (vot),human

                Comments

                Comment on this article