We present a new methodological approach which combines both naturally-occurring speech harvested on the web and speech data elicited in the laboratory. This proof-of-concept study examines the phenomenon of focus sensitivity in English, in which the interpretation of particular grammatical constructions (e.g., the comparative) is sensitive to the location of prosodic prominence. Machine learning algorithms (support vector machines and linear discriminant analysis) and human perception experiments are used to cross-validate the web-harvested and lab-elicited speech. Results confirm the theoretical predictions for location of prominence in comparative clauses and the advantages using both web-harvested and lab-elicited speech. The most robust acoustic classifiers include paradigmatic (i.e., un-normalized), non-intonational acoustic measures (duration and relative formant frequencies from single segments). These acoustic cues are also significant predictors of human listeners’ classification, offering new evidence in the debate whether prominence is mainly encoded by pitch or by other cues, and the role that utterance-normalization plays when looking at non-pitch cues such as duration.