Speech recognition is rapid, automatic and amazingly robust. How the brain is able to decode speech from noisy acoustic inputs is unknown. We show that the brain recognizes speech by integrating bottom-up acoustic signals with top-down predictions. Subjects listened to intelligible normal and unintelligible fine structure speech that lacked the predictability of the temporal envelope and did not enable access to higher linguistic representations. Their top-down predictions were manipulated using priming. Activation for unintelligible fine structure speech was confined to primary auditory cortices, but propagated into posterior middle temporal areas when fine structure speech was made intelligible by top-down predictions. By contrast, normal speech engaged posterior middle temporal areas irrespective of subjects' predictions. Critically, when speech violated subjects' expectations, activation increases in anterior temporal gyri/sulci signalled a prediction error and the need for new semantic integration. In line with predictive coding, our findings compellingly demonstrate that top-down predictions determine whether and how the brain translates bottom-up acoustic inputs into intelligible speech.