Timbre is the attribute of sound that allows humans and other animals to distinguish among different sound sources. Studies based on psychophysical judgments of musical timbre, ecological analyses of sound's physical characteristics as well as machine learning approaches have all suggested that timbre is a multifaceted attribute that invokes both spectral and temporal sound features. Here, we explored the neural underpinnings of musical timbre. We used a neuro-computational framework based on spectro-temporal receptive fields, recorded from over a thousand neurons in the mammalian primary auditory cortex as well as from simulated cortical neurons, augmented with a nonlinear classifier. The model was able to perform robust instrument classification irrespective of pitch and playing style, with an accuracy of 98.7%. Using the same front end, the model was also able to reproduce perceptual distance judgments between timbres as perceived by human listeners. The study demonstrates that joint spectro-temporal features, such as those observed in the mammalian primary auditory cortex, are critical to provide the rich-enough representation necessary to account for perceptual judgments of timbre by human listeners, as well as recognition of musical instruments.
Music is a complex acoustic experience that we often take for granted. Whether sitting at a symphony hall or enjoying a melody over earphones, we have no difficulty identifying the instruments playing, following various beats, or simply distinguishing a flute from an oboe. Our brains rely on a number of sound attributes to analyze the music in our ears. These attributes can be straightforward like loudness or quite complex like the identity of the instrument. A major contributor to our ability to recognize instruments is what is formally called ‘timbre’. Of all perceptual attributes of music, timbre remains the most mysterious and least amenable to a simple mathematical abstraction. In this work, we examine the neural underpinnings of musical timbre in an attempt to both define its perceptual space and explore the processes underlying timbre-based recognition. We propose a scheme based on responses observed at the level of mammalian primary auditory cortex and show that it can accurately predict sound source recognition and perceptual timbre judgments by human listeners. The analyses presented here strongly suggest that rich representations such as those observed in auditory cortex are critical in mediating timbre percepts.