One component of clinical information systems is a registry of patients. Registries allow providers to identify gaps in care at the population level. Registries also allow for rapid cycle continuous quality improvement, targeted practice change and improved outcomes. Most registries are built based on membership with an insurer or other selection criteria. Little, if any data exist on registries representing demographically heterogeneous populations. Administrative and clinical data for the period 1/1/2000-12/30/03 were examined. In total, 46,082,941 lab reports, 233,292,544 medical records, and 9,351,415 medical record abstracts, representing approximately 2 million unique patients were searched. The diabetes source population was identified by presence of any one of the following criteria: ICD-9 code 250 (diabetes) for inpatient, emergency room or outpatient visits; any hemoglobin A1c result; blood glucose >200mg/dl; or diabetes medication. A diagnosis of diabetes was verified by trained chart reviewers on a sample of patients. Single indicators and combinations were examined to determine optimal identification of these cases. In two separate validation studies, using two or more indicators or outpatient diagnosis maximized positive predictive value (PPV) (96 and 97%) and sensitivity (99 and 100%) and identified 55,807 individuals. When all patients with a single indicator of outpatient diagnosis (which had the highest single PPV of 94 and 95%) were included together with those having >or=2 indicators, the final sample size was 65,725. Two or more indicators or an out-patient-diagnosis identifies a sizeable and unselective diabetes database which can be used to track processes and outcomes.