The human capacity to recognize complex visual patterns emerges in a sequence of brain areas known as the ventral stream, beginning with primary visual cortex (V1). We develop a population model for mid-ventral processing, in which non-linear combinations of V1 responses are averaged within receptive fields that grow with eccentricity. To test the model, we generate novel forms of visual metamers — stimuli that differ physically, but look the same. We develop a behavioral protocol that uses metameric stimuli to estimate the receptive field sizes in which the model features are represented. Because receptive field sizes change along the ventral stream, the behavioral results can identify the visual area corresponding to the representation. Measurements in human observers implicate V2, providing a new functional account of this area. The model explains deficits of peripheral vision known as “crowding”, and provides a quantitative framework for assessing the capabilities of everyday vision.