Introduction Genetic aberrations of FOXP2 cause developmental verbal dyspraxia (DVD), which is characterized by impaired production of sequenced mouth movements and both expressive and receptive language deficits [1–4]. Brain imaging studies in adult FOXP2 patients implicate the basal ganglia as key affected regions [5–7], and FOXP2 is prominently expressed in the developing human striatum [8]. These findings raise the question whether the speech and language abnormalities observed in individuals with DVD result from erroneous brain development or impaired function of differentiated neural circuits in the postnatal brain, or a combination of both. Human speech and learned vocalizations in oscine birds bear behavioral and neural parallels [9]. Thus songbirds are a suitable model for studying the neural mechanisms of imitative vocal learning, including speech and its pathologies. The FoxP2 expression patterns in songbird and human brains are very similar, with strong expression in the basal ganglia, thalamus, and cerebellum [8,10,11]. Moreover, FoxP2 expression in the basal ganglia song nucleus, Area X, which is important for normal song development [12,13], transiently increases at the time when young zebra finches learn to sing. In adult canaries, FoxP2 expression in Area X is elevated during the late summer months, coincident with the incorporation of most new syllables to their seasonally changing song [10]. FoxP2 is down-regulated in Area X when adult zebra finches sing slightly variable, undirected song, but not when they sing more stereotyped female-directed song [14]. Together, these correlative findings raise the question whether FoxP2 and vocal plasticity are causally related. Using lentivirus-mediated RNA interference (RNAi) during song development, we now show that zebra finches with reduced FoxP2 expression levels in Area X imitated tutor songs incompletely and inaccurately. This effect was already evident during vocal practice in young birds. Moreover, the acoustic structure and the duration of song syllables in adults were abnormally variable, similar to word production in children with DVD [15]. These findings are consistent with a role of FoxP2 during auditory-guided vocal motor learning in songbird basal ganglia. Results Establishing Lentiviral-Mediated RNAi in the Zebra Finch Vocal learning in zebra finches proceeds through characteristic stages. In the sensory phase that commences around 25 d after hatching (post-hatch day [PHD]), young males memorize the song of an adult male tutor. Concomitantly, they start vocalizing the so called “subsong,” consisting of quietly uttered, poorly articulated, and nonstereotypically sequenced syllables [16]. Following intensive vocal practice and improvement toward matching the tutor song during the period of “plastic song,” they eventually imitate the song of their tutor with remarkable fidelity around PHD90. The structural and temporal characteristics of adult “crystallized” song remain essentially stable throughout adult life. To study the function of FoxP2 during song learning of zebra finches, we reduced the levels of FoxP2 expression bilaterally in Area X in vivo, using lentivirus-mediated RNAi. In this approach, short interfering hairpin RNA (shRNA) containing sense and antisense sequences of the target gene connected by a hairpin loop are expressed from a viral vector. The virus stably integrates into the host genome, enabling expression throughout the life of the animal [17]. We designed two different shRNAs (shFoxP2-f and shFoxP2-h) targeting different sequences in the FoxP2 gene. Both hairpins strongly reduced the levels of overexpressed FoxP2 protein in vitro (Figure 1F), but did not change the levels of overexpressed protein levels of FoxP1, the closest homolog of FoxP2. For further control experiments, we generated a shRNA designed not to target any zebra finch gene (shControl). As expected, this nontargeting shRNA did not affect expression of either FoxP2 or FoxP1 in vitro (Figure 1F). Since shFoxP2-f and shFoxP2-h targeted FoxP2 with similar efficiency, both of them were interchangeably used for subsequent in vivo experiments (shFoxP2-f/-h). Figure 1 Establishing Lentivirus-Mediated Knockdown of FoxP2 in Zebra Finch Area X (A) Phase contrast image of a sagittal 50-μm brain section from a male zebra finch. Area X is outlined by white arrows (scale bar indicates 1 mm). The microinjection into Area X is schematized in the inset. (B) Fluorescent microscopy image of (A). Virus-infected cells expressed GFP (green). (C) FoxP2 immunostaining (red; scale bar indicates 10 μm) (D) The neuron shown in (C) also expressed viral GFP from injection with the nontargeting shControl virus. (E) Overlay picture of (C) and (D). (F) Overexpression of zebra finch FoxP2 (left panel) or FoxP1 (right panel), each tagged with the V5 epitope, and one of different hairpin constructs (shFoxP2-f, shFoxP2-h, or shControl) in HEK293 T cells. Western blot analysis using a V5 antibody revealed that shFoxP2-f and shFoxP2-h, but not shControl, efficiently reduced FoxP2 levels. FoxP1 protein levels were unaffected by overexpression of either shRNA. Immunostaining with an actin antibody shows comparable loading of protein samples. (G) Knockdown of FoxP2 in vivo. Immunofluorescent staining with an antibody against FoxP2 on 50-μm brain sections from birds injected with shFoxP2-f/-h in one hemisphere and shControl in the contralateral hemisphere 30 d prior to analysis revealed lower fluorescence levels and fewer cells in knockdown (upper panel) compared to control sections (lower panel). FoxP2-positive cells appear red; virally infected cells express GFP, visible in green (scale bar indicates 20 μm). (H) Quantification of in vivo knockdown efficiency. The fluorescence intensity of FoxP2 immunostaining was measured in images from brain sections injected with shFoxP2-f/-h in one hemisphere and shControl in the contralateral hemisphere 30 d prior to analysis. All antibody incubations were performed simultaneously, and pictures were taken with identical camera settings. Bars represent average intensity levels normalized to the shControl-injected hemisphere (± standard error of the mean [SEM]; two-tailed Mann-Whitney U test, **p 0.5; shFoxP2-f/-h, n = 6, shControl, n = 7). Quantification of Area X volume targeted by virus injection in an equally treated group of birds, but sacrificed at PHD50, confirmed the results obtained for PHD90 (mean volume 20.4% ± STDV 4.0%; two-tailed Mann-Whitney U test, p > 0.6; shControl n = 3 hemispheres from 3 animals; shFoxP2-f/-h, n = 3 hemispheres from 3 animals). To quantify the neuronal extent of lentivirus expression in Area X, we used immunohistochemical staining with the neuronal marker Hu [18] (Figure S2). Of all virus-infected cells, 78.5% ± 3.5% were neurons (mean ± standard error of the mean [SEM]; no significant difference between shFoxP2 and shControl, two-tailed Mann-Whitney U test, p > 0.7; shControl injections n = 3 hemispheres from 3 animals, shFoxP2 injections n = 4 hemispheres from 4 animals;). This result is consistent with Wada et al. [19], who used the same viral constructs in the zebra finch brain in vivo. Among the infected cells were FoxP2-positive spiny neurons, which are assumed to be the most common cell type in Area X [20] (Figure 1C–1E). To quantify FoxP2 knockdown in vivo, we determined FoxP2 protein levels in Area X on PHD50, the time of peak FoxP2 expression [10] in birds injected on PHD23 with shFoxP2-f/-h in one hemisphere and shControl into the contralateral hemisphere. The signal of the immunofluorescent staining with a FoxP2 antibody was significantly lower in knockdown Area X than in control Area X (Figure 1G and 1H). We also assessed FoxP2 mRNA levels after knockdown in Area X. Birds were injected on PHD23 with shFoxP2-f/-h in one hemisphere and shControl in the contralateral hemisphere. On PHD50, we punched out Area X of injected birds and measured FoxP2 mRNA levels by real-time PCR. FoxP2 levels were normalized to two independent RNAs coding for the housekeeping genes Hmbs and Pfkp. FoxP2 mRNA was reduced on average by approximately 70% in the shFoxP2-infected region of Area X compared to the shControl-infected region of Area X (Figure 1I). Of note, RNAi-mediated knockdown approximates FOXP2 levels in DVD patients, since haploinsufficiency, a 50% reduction of functional FOXP2 protein, is apparently the common feature of all reported human FOXP2 mutations [4,21]. To demonstrate that RNAi-mediated gene knockdown can persist in vivo throughout the entire song-learning phase, we used a virus expressing shRNA against the viral reporter GFP (shGFP) in conjunction with the virus expressing a shRNA lacking a target gene (shControl). We injected young zebra finches on PHD23 with equal amounts of equally infectious shGFP and shControl virus in the left and right hemisphere, respectively. More than 3 mo later, on PHD130, the GFP signal in the shGFP-injected hemisphere was still 70.5% ± 5.8% less intense than in the shControl-injected hemisphere (mean ± SEM; n = 2; Figure 1J). To rule out potential side effects of FoxP2 knockdown on cellular survival in Area X, we investigated apoptosis in Area X 6 d after surgery with terminal deoxyribonucleotide transferase-mediated dUTP nick end labeling (TUNEL). The TUNEL method detects genomic DNA double-strand breaks characteristic of apoptotic cells. Of 1,149 GFP-positive cells counted in six hemispheres from three animals, only five were TUNEL-positive (Figure S3). ShControl-injected and uninjected animals had similar low levels of apoptotic cells (unpublished data). Thus, FoxP2 is not a gene essential for short-term survival of postmitotic neurons. Since the TUNEL method does not capture any long-term changes in neuronal viability that might follow after reduction of FoxP2, we used the neuronal marker Hu to determine neuronal densities in Area X 30 d after injecting either shFoxP2-f/-h or shControl virus (Figure S4). Neuronal densities in the infected region in Area X did not differ in knockdown and shControl-injected birds (two-tailed Mann-Whitney U test, p > 0.39; shControl, n = 4 hemispheres; shFoxP2-f/-h, n = 3 hemispheres). Density of neurons were also comparable inside and outside of the virus-infected region of Area X for all viruses (two-tailed Mann-Whitney U test, p > 0.6 for both shFoxP2-f/-h and shControl). In sum, these data demonstrate that virus-mediated RNAi can induce specific, long-lasting knockdown of gene expression in zebra finch Area X without causing cell death. Song Imitation of FoxP2 Knockdown Zebra Finches Adult zebra finch song consists of different sound elements, here called syllables, that are separated by silent intervals. Syllables are rendered in a stereotyped sequential order, constituting a motif. During a song bout, a variable number of motifs are sung in short succession. To obtain a first descriptive account of the song of knockdown and control pupils, we measured mean acoustic features for all syllables recorded from all pupils using the software Sound Analysis Pro (SAP) [22]. The features extracted were mean pitch, mean frequency, mean frequency modulation (FM; change of frequency in time), mean entropy, and mean pitch goodness (PG; periodicity of sound), as well as mean duration. The comparison of the distribution of these features across the repertoire of knockdown and control pupils did not reveal any significant differences, indicating that knockdown pupils, control pupils, and tutors sang syllables with similar acoustic features (Figure S5). Next, we analyzed the behavioral consequences of bilateral FoxP2 knockdown in Area X for the outcome of song learning at PHD90. When a juvenile male finch is tutored individually by one adult male, the pupil learns to produce a song that strongly resembles that of his tutor [23]. We therefore determined learning success by the degree of acoustic similarity between pupil and tutor songs. Analysis of song recorded at PHD90 revealed that pupils with experimentally reduced FoxP2 levels in Area X imitated tutor songs with less fidelity than control animals did (see also Audio S1–S6). The comparison of sonograms from shControl-injected (Figure 2A) and shFoxP2-injected pupils (Figure 2B and 2C) with their respective tutors shows the characteristic effects caused by reduction of FoxP2. Typical features of FoxP2 knockdown pupils included syllable omissions (Figure 2B, syllables C, D, F, and G; Figure 2C, syllable B), imprecise copying of syllable duration (Figure 2B, syllable E longer; Figure 2C, syllable D shortened), and inaccurate imitation of spectral characteristics (Figure 2B, syllable E; Figure 2C, syllable D). In addition, in four out of seven knockdown pupils, the motif contained repetitions of individual syllables or syllable pairs (e.g., see Figure 2B and 2C). In contrast, none of the control or tutor motifs contained repeated syllables. Pupils did not reverse the sequential order of syllables in the tutor motifs, except for one control (unpublished data) and one FoxP2 knockdown pupil (Figure 3A). Figure 2 Incomplete Tutor Song Imitation by FoxP2 Knockdown Pupils Each sonogram depicts a typical motif of one animal (scale bars indicate 100 ms, frequency range 0–8,600 Hz). Tutor syllables are underlined with black bars and identified by letters. The identity of pupil syllables was determined by similarity comparison to tutor syllables using SAP software. Imprecisely copied pupil syllables are designated with red italic letters. (A–C) (A) tutor #38 and shControl-injected pupil, (B) tutor #396 and shFoxP2-injected pupil, and (C) tutor #414 and shFoxP2-injected pupil. ShFoxP2-injected pupils copied fewer syllables and the fidelity of syllable imitation was worse than in shControl pupils, reflected by lower SAP scores (similarity/accuracy indicated vertically at the right edge of the sonograms). (D) The mean similarity scores between tutor and pupil motifs were significantly lower in shFoxP2- injected pupils than in shControl- and shGFP-injected pupils (± SEM; two-tailed Mann-Whitney U test, **p 0.5). Figure 3 Inaccurate Tutor Song Imitation by FoxP2 Knockdown Pupils (A) Representative sonograms of FoxP2 knockdown and control pupils both tutored by male 388 (scale bars indicate 100 ms, frequency range = 0–8,600 Hz). Syllables are underlined with black bars and identified by letters. The identity of pupil syllables was determined by similarity comparison to tutor syllables using SAP software. Red italic letters denote imprecisely copied syllables. Inaccurate imitation is particularly evident in the second element of syllable A and the first element of syllable B. Similarity and accuracy scores are indicated vertically at the right edge of the sonograms. (B) Average motif accuracy was significantly lower in shFoxP2 knockdown pupils compared to control pupils, indicating that they imitated their tutors less precisely (±SEM; two-tailed Mann-Whitney U test, **p 0.4). (C) The frequency distribution of identity scores of all syllables from FoxP2 knockdown pupils (dark grey bars)was shifted towards lower scores, compared to control pupils (light grey bars).. This suggests that all syllable types were affected. (Identity scores were obtained from comparison of pupil/tutor syllable pairs; shFoxP2 n = 24 syllables from 7 animals; shControl n = 26 syllables from 7 animals). (D) Comparison of syllable duration and mean acoustic feature values (FM and PG) between pupil syllables and their respective tutor syllables. The divergence of imitated syllables from the tutor model tended to be larger for all acoustic measures in the FoxP2 knockdown pupils (dark grey bars) than in the controls (light grey bars). For average syllable duration and mean entropy measures, the difference was significant (±SEM; two-tailed Mann-Whitney U test, **p 0.8; tutors n = 6, on average 5 syllables per animal; shControl n = 7 animals, on average 4 syllables per animal; shGFP n = 3, on average 5 syllables per animal). (D) Syllable duration varied more from rendition to rendition in knockdown pupils (shFoxP2) than in controls (shControl and shGFP) and tutors, as indicated by a higher mean coefficient of variation of syllable duration (±SEM, two-tailed Mann-Whitney U test, **p 0.7, same animals as [C]). Next, we quantified the variability of syllable duration between different renditions of the same syllable. The coefficient of variation of syllable duration was significantly higher in knockdown than in control pupils and tutors, suggesting imprecise motor coordination on short temporal scales (Figure 4D). Notably, the timing of syllables in control pupils (shControl and shGFP) was as stable as in tutors (Figure 4D). The variability of syllable duration in tutor and control birds varied in the same range as reported previously [30], emphasizing how tightly adult zebra finches normally control syllable duration. Finally, we analyzed the sequential order of syllables over the course of many motifs. To this end, we first annotated sequences of 300 user-defined syllables with the positions in their respective motifs. We then measured the stereotypy of a motif by calculating for each syllable the entropy of its transition distribution. Based on this entropy measure, we generated a sequence consistency score (1 − entropy), which reflects song stereotypy. An entropy score of 0 indicates random syllable order, whereas a score of 1 reflects a fixed syllable order. The mean sequence consistency was similar in shControl and shFoxP2-f/-h animals (Figure S8). Because stereotypy of motif delivery is a hallmark of “crystallized” adult song, it seems plausible that both knockdown animals and controls had reached the end of the sensory-motor learning period [31]. To investigate this question in more detail, we next analyzed the song of knockdown and control pupils recorded at earlier stages of song development. Song Development in FoxP2 Knockdown Zebra Finches To explore the developmental trajectory of song learning in knockdown and control pupils, we analyzed songs recorded during plastic song at PHD65 and towards the end of the learning phase at PHD80. Since syllables are not yet rendered in a stereotyped motif structure at PHD65, we quantified song imitation success and vocal variability on the level of the syllables only. To avoid the necessity of identifying individual syllables based on their morphology, we made use of an automated procedure provided by SAP to compare all song material from a given day to the tutor's typical motif. The vocalizations of pupils were first segmented into syllables. All segments were subsequently compared to the typical motif of the tutor in a pairwise fashion (between 1,000–3,000 comparisons per pupil per day). The output variable of these measurements is an accuracy score, which describes the extent to which the pupil's sounds match those of the tutor (see Materials and Methods and [22] for further details). We found that knockdown pupils imitated their tutors less accurately than control pupils already at PHD65 (Figure 5A). The frequency distribution of accuracy values also suggests that imprecise syllable imitation was not skewed towards particular syllables or syllable types (Figure S9). This result is in line with the observation made earlier for the syllables at PHD90 (Figure 3C). In contrast to control pupils, knockdown pupils did not improve in accuracy after PHD80, suggesting they had reached the end of the learning phase (Figure 5A). Figure 5 Differences in Song Development of FoxP2 Knockdown and Control Pupils (A) We measured the accuracy of syllable imitation in song recordings of the same pupils made at three different ages (PHD65, PHD80, and PHD90), using the automated batch procedure in Sound Analysis Pro. Data points represent mean values (±SEM) of 1,000–3,000 pairwise comparisons between pupil recordings and the tutor model (shFoxP2-f/-h, n = 7 animals for all ages; shControl, n = 5 animals for PHD65, n = 6 animals for PHD80, and n = 7 animals for PHD90). Syllable imitation was already less accurate in FoxP2 knockdown pupils by 65PHD (two-tailed Mann-Whitney U test, PHD65 *p 0.5). The dashed line connecting the data points illustrates the directionality of changes over time. but does not imply a linear relationship. (B) Variance of syllable accuracy values increased with age in knockdown pupils, but not in controls (two-tailed Wilcoxon signed-rank test, PHD65 to PHD90 *p 0.4). This leads to significantly higher variance at PH90 in knockdown pupils compared to control pupils (two-tailed Mann-Whitney U test, PHD90 *p 0.9; n = 5 for shFoxP2-f/-h and n = 7 for shControl), suggesting that up to this age, syllable imitation followed largely similar dynamics. However, from PHD80 to PHD90, accuracy of syllable imitation continued to improve only in control, but not in knockdown pupils (two-tailed Mann-Whitney U test, p 0.8 for both similarity and accuracy). Quantification of FoxP2 knockdown. Young male zebra finches received an injection of shFoxP2-f/-h virus in one hemisphere and an injection with control virus (shControl) in the contralateral hemisphere on PHD23 as described above. For the quantification of protein levels after FoxP2 knockdown, we performed an immunohistological staining with the FoxP2 antibody on 50-μm sections 30 d after virus injection. Immunohistological staining was performed as described [10], but using an antibody dilution of 1:5,000. All sections were processed at the same time with the same batch of antibody solution. Images of stained brain sections were taken with a digital camera using the Simple PCI software (Compix) at 40× magnification. For each section, we acquired multiple Z-stacked images of the virus-infected area (230.3 μm × 230.3 μm), and reconstructed a maximal projection. All images from the same bird were taken with the same microscope and software settings. Finally, we quantified fluorescence intensity levels in the images. The intensity of the green fluorescence from the viral GFP was not significantly different between shFoxP2-f/-h–injected and shControl-injected hemispheres (two-tailed Mann-Whitney U test, p > 0.3). For the quantification of FoxP2 knockdown mRNA levels, young male zebra finches were injected with shFoxP2-f/-h virus in one hemisphere and control virus (shControl) in the contralateral hemisphere on PHD23, as described above. This permitted analysis of FoxP2 knockdown in the same bird while avoiding confounding differences in gene expression levels between birds. On PHD50, we sacrificed the birds and excised the GFP-expressing brain area with a 1-mm–diameter glass capillary (Brand) under a fluorescence dissecting microscope. RNA was extracted with TRIZOL (Invitrogen); yield was determined by UV spectroscopy at 260/280 nm with a Nanodrop device. FoxP2 expression was quantified by real-time PCR using SybrGreen (Applied Biosystems). We determined relative FoxP2 expression levels through normalization to the expression levels of two internal control genes, which were identified in a BLAST homology search for the mouse housekeeping genes Hmbs and Pfkp in the database from the Songbird Neurogenomics Initiative (http://titan.biotec.uiuc.edu/songbird/) and the Songbird Brain Transcriptome Database (http://songbirdtranscriptome.net/). The expression of Hmbs and Pfkp in the left and right hemisphere in both injected and untreated animals was equivalent (numbers indicate fold change between left and right hemispheres; untreated: Hmbs = 1.4 ± 0.5 and Pfkp = 1.3 ± 0.6; injected: Hmbs = 1.0 ± 0.4 and Pfkp = 1.1 ± 0.4, n=5 birds). Relative expression levels were determined with the comparative cycle time (Ct) method. All primers used in this study amplified the cDNA with similar efficiency (E = 1 ± 5%) in a validation experiment. Normalized Ct values from the same animal were calibrated to the shControl-injected hemisphere. FoxP2 expression levels are thus presented as the ratio of expression in shControl- to shFoxP2 -injected hemispheres. Song recording and analysis. Vocalizations were recorded between 9 am and 4 pm on PHDs 65, 80, and between 90 to 93 in absence of the tutor. Quantitative song analysis was performed using the SAP software, version 1.04 [22,51]. We analyzed song at the level of the syllables, the motif, and syntax. We define “syllable” as a continuous sound element, surrounded by silent intervals. The “typical song motif” was defined as the succession of syllables that includes all syllable types (except introductory notes), and occurs in a repeated manner during a song bout. Syntax refers to the sequence of syllables in many successive motifs. Motif analysis. We quantified how well pupils had copied the motif of their tutor using a similarity score and an accuracy score obtained in SAP from ten asymmetric pairwise comparisons of the pupil's typical motif with the tutor motif. In asymmetric comparisons, the most similar sound elements of two motifs are compared, independent of their position within a motif. The smallest unit of comparison are 9.26-ms–long sound intervals (FFT windows). Each interval is characterized by measures for five acoustic features: pitch, FM, amplitude modulation (AM), Wiener entropy, and PG. SAP calculates the Euclidean distance between all interval pairs from two songs, over the course of the motif, and determines a p-value for each interval pair. This p-value is based on p-value estimates derived from the cumulative distribution of Euclidean distances across 250,000 sound-interval pairs, obtained from 25 random pairs of zebra finch songs. Neighboring intervals that pass the p-threshold value (p = 0.1 in this study) form larger similarity segments (70 ms). The amount of sound from the tutor's motif that was included into the similarity segments represents the similarity score; it thus reflects how much of the tutor's song material was found in the pupil's motif. To measure how accurately pupils copied the sound elements of the tutor motif, we used the accuracy score from SAP. The accuracy score is computed locally, across short (9 ms) FFT windows and indicates how well the sound matched to the sound in the tutor song. SAP calculates an average accuracy value of the motif by averaging all accuracy values across the similarity segments. Syllable analysis—manual counting of imitated syllables. For manual counting of imitated syllable types, two individuals who were blind to treatment counted all syllables that matched a tutor syllable by visual inspection of sonograms. Their interobserver reliability was 80%. Syllable analysis—syllable acoustic features. We extracted the mean pitch, mean FM, mean entropy, and mean PG, as well as mean duration from 25 renditions of each syllable. To compare the similarity of individual spectral features between pupil and tutor syllables, we subtracted each mean feature value of each tutor syllable from the mean feature value of the corresponding pupil syllable. Next, we normalized the absolute differences between the values of tutor and pupil syllables to the values of the tutor syllable to obtain the difference of a pupil syllable in a given feature from the tutor syllable in percent. To describe the variability of syllable duration between different renditions, we calculated the coefficient of variation of duration values among 25 renditions of each syllable. Syllable analysis—syllable identity score. We quantified the acoustic similarity between different syllables using symmetric comparisons to obtain syllable identity scores. In contrast to asymmetric comparison, no similarity segments are identified during symmetric comparisons. Instead, the FFT windows are compared sequentially from beginning to the end of the two sounds. Thus, similarity reflects how many sound intervals were above p-value, and accuracy indicates the average (1 − p-value). To comprehensively capture the acoustic similarity between syllables in a single measure we used the product of similarity and accuracy to obtain the syllable identity score. As for the motif analysis the p-threshold value was set to p = 0.1. To quantify how accurately pupils learned individual syllables, we performed ten symmetric comparisons of each pupil syllable with its corresponding tutor syllable. To assess how variable the same pupil performed a particular syllable in multiple renditions of his motif, we compared 20 renditions of each syllable, two at a time. Because minute temporal shifting of FFT windows is allowed in symmetric comparisons (10 ms in this study), the more variable duration of syllables in FoxP2 knockdown animals did not bias the identity score. The syllable identity score rather reflects spectral differences between syllables. Syntax analysis. For each pupil, we manually annotated sequences of 300 user-defined syllables with the positions in their respective motifs. That is, each syllable of a motif was given a unique integer. Based on these data, we computed the Markov chain for each pupil, i.e., all transition probabilities between syllables. To measure the stereotypy of a motif, we calculated for each syllable the entropy of its transition distribution [52]. Because motif duration differed between birds, these entropy values were rescaled by the maximal possible entropy for each given motif duration. The entropy score for a pupil was then represented by the average of these fractions of maximal entropy over all syllables. Based on this entropy measure, we generated a sequence consistency score (1 − entropy measure), which reflects song stereotypy. An entropy score of 0 indicates random syllable order, whereas a score of 1 reflects a fixed syllable order. Analysis of song development. To determine tutor similarity and vocal variability during plastic song and towards the end of the learning phase, we analyzed songs recorded on PHD65, PHD80, and PHD90–93 (PHD ± 1 d; in one control pupil, recordings were only available from PHD75 instead of PHD80). First, all sound files from one day were segmented into sounds in the feature batch mode of SAP. Here, the pupils' vocalization is separated from nonvocalization background using two thresholds (Wiener entropy and amplitude). The thresholds were adjusted for each pupil individually to obtain an optimal segmentation. We validated the segmentation for each pupil by visual inspection of the segments and confirmed that segments correspond to syllables. Next, all segments from a given day (between 1,000 and 3,000 segments) were automatically compared to the tutor motif. That is, in each comparison, SAP identifies the best possible match to the tutor motif for each segment. Of all segments analyzed from PHD65, PHD80, and PHD90, 11.0% ± 0.9% were less similar to the tutor model than two random zebra finch sounds are to each other, and thus did not receive any accuracy value in SAP. These sounds were found to represent cage noise, mostly. There were no differences between the amount of sounds excluded between knockdown and control pupils for any of the ages (two-tailed Mann-Whitney U test, p > 0.9 for PHD65; p > 0.8 for PHD80; p > 0.7 for PHD90). Supporting Information Audio S1 Example of Song Motif from Tutor #414 (199 KB WMA) Click here for additional data file. Audio S2 Example of Song Motif from Pupil of Tutor #414 (123 KB WMA) Click here for additional data file. Audio S3 Example of Song Motif from Tutor #38 (166 KB WMA) Click here for additional data file. Audio S4 Example of Song Motif from Pupil of Tutor #38 (223 KB WMA) Click here for additional data file. Audio S5 Example of Song Motif from Tutor #396 (245 KB WMA) Click here for additional data file. Audio S6 Example of Song Motif from Pupil of Tutor #396 (370 KB WMA) Click here for additional data file. Figure S1 Timeline of Experiments By PHD20, fathers and older male siblings were removed from family cages to prevent experimental zebra finches from instructive auditory experience prior to the onset of tutoring. At the beginning of the sensory learning period at PHD23, virus was injected bilaterally into Area X. From PHD30 on, injected birds were housed individually in sound-recording chambers together with an adult male zebra finch as tutor. We recorded the song of pupils on PHD65, PHD80, and between PHD90 and 93 using an automated recording system, in absence of the tutor. (31 KB PDF) Click here for additional data file. Figure S2 Immunohistochemical Staining with the Neuronal Marker Hu Identified Virus-Infected Neurons Expressing GFP (A) shows neuronal marker Hu, and (B) shows virus-infected neurons expressing GFP. These neurons appear yellow in the merged image (C) (scale bar indicates 20 μm). (288 KB PDF) Click here for additional data file. Figure S3 Infection with shFoxP2-Virus Did Not Induce Apoptosis (A) We labeled apoptotic cells in 50-μm sagittal sections from PHD29 male zebra finch brains injected with shFoxP2 or shControl virus on PHD23. DNA double-strand breaks characteristic of apoptotic cells were detected using the TUNEL method, visualized with an Alexa568 secondary antibody (red). The filled white arrow points to a TUNEL-labeled cell not infected by shFoxP2-f. (B) The open white arrow points to a shFoxP2-infected cell expressing the viral reporter GFP, but showing no TUNEL labeling (A). (C) DAPI staining identifies cellular nuclei. The apoptotic cell (white arrow) contains fragmented DNA typical of apoptosis. (D) Overlay picture of (A–C). (E) As positive control for the TUNEL method, we treated a section adjacent to that shown in (A–D) for 10 min with DNAse to artificially induce DNA double-strand breaks. (E–H) Numerous cells were now detected, among them a virally infected cell expressing GFP (white arrow in [E–H]). Colors as in (A–D). Scale bar in (A) indicates 10 μm. (658 KB PDF) Click here for additional data file. Figure S4 Neuronal Densities Were Similar in Area X Injected with Either shFoxP2 or shControl Neuronal densities were measured using the neuronal marker Hu in Area X 30 d after injecting either shFoxP2-f/-h or shControl virus. Bar graphs represent the number of neurons/mm2. Neuronal densities in the virus-infected region in Area X were similar in knockdown and shControl-injected birds (two-tailed Mann-Whitney U test, p > 0.39; shControl, n = 4 hemispheres; shFoxP2-f/-h, n = 3 hemispheres). Moreover, there were no differences between inside and outside of the injection site for any of the viruses (two-tailed Mann-Whitney U test, p > 0.6 for both shFoxP2-f/-h and shControl). (54 KB PDF) Click here for additional data file. Figure S5 Syllables from Knockdowns and Control Zebra Finches Were Similar in the Distribution of their Acoustic Features and Their Duration Box plots represent the distribution of mean pitch (A), mean frequency (B), mean frequency modulation (FM) (C), mean entropy (D), mean goodness of pitch (PG) (E), and mean duration (F) across all syllables from tutors and each experimental group (shControl-, shGFP-, and shFoxP2-injected zebra finches). Boxes indicate the interquartile range (IQR) of the distribution; circles and asterisks specify individual values lying beyond the inner (1.5 × IQR) and outer fences (3 × IQR), respectively (n = 40 syllables for tutors; n = 31 syllables for shControl; n = 15 syllables for shGFP; and n = 31 syllables for shFoxP2). Mean syllable acoustic features and syllable duration (each averaged per animal) were not significantly different between groups (ANOVA; n = 6 tutors; n = 7 birds for shControl; n = 3 birds for shGFP; and n = 7 animals for shFoxP2-f/-h). (60 KB PDF) Click here for additional data file. Figure S6 Manual Counting of Syllables Copied by Knockdown and Control Animals All syllables that matched a tutor syllable by visual inspection on a sonogram were counted for shFoxP2- and shControl-injected animals. Bars represent the mean percentage of tutor syllables copied by the pupils (± STDEV, two-tailed Mann-Whitney U test, **p = 0.004; n = 7 animals for both shControl and shFoxP2-f/-h). (54 KB PDF) Click here for additional data file. Figure S7 Both Hairpin Constructs Targeting FoxP2 Affected Song Imitation to the Same Degree Bars indicate the similarity and accuracy scores, respectively, of zebra finches injected with either shFoxP2-f or shFoxP2-h (± SEM; two-tailed Mann-Whitney U test, p > 0.6 for similarity and p > 0.4 for accuracy). (54 KB PDF) Click here for additional data file. Figure S8 The Syllable Sequence within Motifs Was Highly Stereotyped across Many Different Renditions, Both in shFoxP2-Injected and shControl-Injected Birds This is reflected by high sequence consistency scores (two-tailed Mann-Whitney U test, no significant difference between shFoxP2-f/-h and shControl, p > 0.6; n = 7 animals for both shFoxP2-f/-h and shControl). The sequence consistency score (1 − entropy) was calculated based on the entropy of sequences of 300 successive syllables. (54 KB PDF) Click here for additional data file. Figure S9 Frequency Distribution of Syllable Accuracy Scores Was Shifted towards Lower Values in FoxP2 Knockdown Pupils Zebra finch vocalizations recorded at PHD65 were first segmented into sounds corresponding to syllables. The segments from each bird were subsequently compared to their respective tutor motif in a pairwise fashion yielding one accuracy score for each sound segment. To obtain a balanced dataset, we randomly extracted 800 accuracy scores from each bird. Bars represent the relative frequency of accuracy scores. (56 KB PDF) Click here for additional data file. Accession Numbers The GenBank (http://www.ncbi.nlm.nih.gov/Genbank) accession numbers for the genes and gene products discussed in this paper are FoxP1 (AY549152), FoxP2 isoform I (AY549148), FoxP2 isoform IV (AY549151), Hmbs (NM_013551), and Pfkp (NM_019703). The Online Mendelian Inheritance in Man (OMIM; http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM) accession number for FOXP2 is 605317.