Data resource basics The National Health Information Database (NHID) is a public database on health care utilization, health screening, socio-demographic variables, and mortality for the whole population of South Korea, formed by the National Health Insurance Service. The population included in the data is over 50 million, and the participation rate in the health screening programs was 74.8% in 2014. The NHID covers data between 2002 and 2014. Those insured by NHI pay insurance contributions and receive medical services from their health care providers. The NHIS, as the single insurer, pays costs based on the billing records of health care providers (Figure 1). To govern and carry out these processes in the NHI, the NHIS built a data warehouse to collect the required information on insurance eligibility, insurance contributions, medical history, and medical institutions. In 2012, the NHIS formed the NHID using information from medical treatment and health screening records and eligibility data from an existing database system. Figure 1. The governance of the National Health Insurance of South Korea. Data collected The eligibility database includes information about income-based insurance contributions, demographic variables, and date of death. The national health screening database includes information on health behaviors and bio-clinical variables. The health care utilization database includes information on records on inpatient and outpatient usage (diagnosis, length of stay, treatment costs, services received) and prescription records (drug code, days prescribed, daily dosage). The long-term care insurance database includes information about activities of daily living and service grades. The health care provider database includes data about the types of institutions, human resources, and equipment. In the NHID, de-identified join keys replacing the personal identifiers are used to interlink these databases. Data resource use Papers published covered various diseases or health conditions like infectious diseases, cancer, cardiovascular diseases, hypertension, diabetes mellitus, and injuries and risk factors such as smoking, alcohol consumption, and obesity. The impacts of health care and public health policies on health care utilization have been also explored since the data include all the necessary information reflecting patterns of health care utilization. Reasons to be cautious First, information on diagnosis and disease may not be optimal for identifying disease occurrence and prevalence since the data have been collected for medical service claims and reimbursement. However, the NHID also collects prescription data with secondary diagnosis, so the accuracy of the disease information can be improved. Second, the data linkage with other secondary national data is not widely available due to privacy issues in Korea. Governmental discussions on the statutory reform of data linkage using the NHID are under way. Collaboration and data access Access to the NHID can be obtained through the Health Insurance Data Service home page (http://nhiss.nhis.or.kr). An ethics approval from the researchers’ institutional review board is required with submission of a study proposal, which is reviewed by the NHIS review committee before providing data. Further inquiries on data use can be obtained by contacting the corresponding author. Funding and competing interests This work was supported by the NHIS in South Korea. The authors declare no competing interests.