Although diverse, theories of visual attention generally share the notion that attention is controlled by some combination of three distinct strategies: (1) exogenous cuing from locally contrasting primitive visual features, such as abrupt onsets or color singletons (e.g., L. Itti, C. Koch, & E. Neiber, 1998), (2) endogenous gain modulation of exogenous activations, used to guide attention to task-relevant features (e.g., V. Navalpakkam & L. Itti, 2007; J. Wolfe, 1994, 2007), and (3) endogenous prediction of likely locations of interest, based on task and scene gist (e.g., A. Torralba, A. Oliva, M. Castelhano, & J. Henderson, 2006). However, little work has been done to synthesize these disparate theories. In this work, we propose a unifying conceptualization in which attention is controlled along two dimensions: the degree of task focus and the contextual scale of operation. Previously proposed strategies-and their combinations-can be viewed as instances of this one mechanism. Thus, this theory serves not as a replacement for existing models but as a means of bringing them into a coherent framework. We present an implementation of this theory and demonstrate its applicability to a wide range of attentional phenomena. The model accounts for key results in visual search with synthetic images and makes reasonable predictions for human eye movements in search tasks involving real-world images. In addition, the theory offers an unusual perspective on attention that places a fundamental emphasis on the role of experience and task-related knowledge.