Cortico-spinal excitability (CSE) in humans measured with Transcranial Magnetic Stimulation (TMS) is generally increased by the perception of other people's actions. This perception can be unimodal (visual or auditory) or multimodal (visual and auditory). The increase in TMS-measured CSE is typically prominent for muscles involved in the perceived action (muscle specificity). There are two main classes of accounts for this phenomenon. One suggests that the motor system mirrors the actions that the observer perceives (the resonance account). The other suggests that the motor system predicts the actions that the observer perceives (the predictive account). To test these accounts (which need not be mutually exclusive), subjects were presented with four versions of three-note piano sequences: sound only, sight only, audiovisual, and audiovisual with sound lagging behind while CSE was measured in two hand muscles. Muscle specificity did not interact with modality in the flexor digiti minimi (FDM), but was reliably higher for the first dorsal interosseous (FDI) while subjects perceived the audiovisual version of the three-note piano sequences with sound lagging behind. Since this version of the three-note piano sequences is the only one that overtly violates experience-based expectations, this finding supports predictive coding accounts of motor facilitation during action perception.