The quality of consumer health information on the World Wide Web is an important issue for medicine, but to date no systematic and comprehensive synthesis of the methods and evidence has been performed. To establish a methodological framework on how quality on the Web is evaluated in practice, to determine the heterogeneity of the results and conclusions, and to compare the methodological rigor of these studies, to determine to what extent the conclusions depend on the methodology used, and to suggest future directions for research. We searched MEDLINE and PREMEDLINE (1966 through September 2001), Science Citation Index (1997 through September 2001), Social Sciences Citation Index (1997 through September 2001), Arts and Humanities Citation Index (1997 through September 2001), LISA (1969 through July 2001), CINAHL (1982 through July 2001), PsychINFO (1988 through September 2001), EMBASE (1988 through June 2001), and SIGLE (1980 through June 2001). We also conducted hand searches, general Internet searches, and a personal bibliographic database search. We included published and unpublished empirical studies in any language in which investigators searched the Web systematically for specific health information, evaluated the quality of Web sites or pages, and reported quantitative results. We screened 7830 citations and retrieved 170 potentially eligible full articles. A total of 79 distinct studies met the inclusion criteria, evaluating 5941 health Web sites and 1329 Web pages, and reporting 408 evaluation results for 86 different quality criteria. Two reviewers independently extracted study characteristics, medical domains, search strategies used, methods and criteria of quality assessment, results (percentage of sites or pages rated as inadequate pertaining to a quality criterion), and quality and rigor of study methods and reporting. Most frequently used quality criteria used include accuracy, completeness, readability, design, disclosures, and references provided. Fifty-five studies (70%) concluded that quality is a problem on the Web, 17 (22%) remained neutral, and 7 studies (9%) came to a positive conclusion. Positive studies scored significantly lower in search (P =.02) and evaluation (P =.04) methods. Due to differences in study methods and rigor, quality criteria, study population, and topic chosen, study results and conclusions on health-related Web sites vary widely. Operational definitions of quality criteria are needed.