In this paper, we explore the role of intonation and visual cues in the perception of statements and questions in two varieties of European Portuguese—the standard (SEP) and the insular variety of Azores, Ponta Delgada (PtD)—previously shown to convey sentence type contrasts by different uses of intonational means and/or facial gestures, namely eyebrow movements. Forty native speakers (20 from each variety) were exposed to SEP and PtD stimuli in a perception task with three conditions (audio only, video only, and audiovisual). The audiovisual condition includes congruent and incongruent (both original and manipulated) stimuli, where there is either a match or a mismatch between the auditory and visual features as potential cues for a specific sentence type. We concluded that both SEP and PtD participants rely more on intonation than on eyebrow movement to identify sentence types, even when exposed to incongruent audiovisual stimuli. In the absence of audio information, unexpectedly, participants do not interpret eyebrow raising as a question marker, not even when perceiving stimuli from their native variety. When exposed to non-native audiovisual stimuli, both SEP and PtD participants present longer reaction times (RTs), especially for incongruent stimuli. Finally, although we confirm the strength of intonation over visual cues, RTs in the audiovisual condition are significantly shorter than in the audio condition, thus pointing to the relevance of visual cues for structural/linguistic marking.