Organizations need to appear legitimate to access resources. Thus, actors often carry out legitimacy work to shape others’ evaluation of something as “desirable, proper or appropriate.” Such research has tended to focus on the cognitive appeal of words. Recently, research has also emerged on the persuasiveness of images, especially for creating emotional appeals. We develop a process model to explain the role of multimodal messages—combining words and images—in legitimacy work. With this model, we aim to answer: Why do certain combinations of multimodal messages (words and images) more forcefully evoke emotion and more reliably capture recipients’ attention, motivate them to process those messages, and (re)evaluate the legitimacy of an organization, its activities, and/or its industry? We conclude by discussing theoretical extensions and connections to other methods such as institutional work and values work.