The Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute (NCI) collects cancer survival and incidence information from population-based cancer registries, encompassing 26% of the US population. Over the past 3 decades, the SEER program database has become an internationally recognized cancer resource, increasingly utilized in the study of cancer epidemiology and outcomes. In this commentary, we will discuss applications and limitations of the SEER public-use database, to help clinicians interpret the many studies that are generated from this database, and to help clinical investigators implement future studies using this valuable national resource.
Originally created in 1973 from two earlier NCI programs—the End Results Program and the Third National Cancer Survey, the SEER program began collecting data from the states of Connecticut, Iowa, New Mexico, Utah, and Hawaii, and the metropolitan areas of San Francisco/Oakland, Seattle, Detroit, and Atlanta. These are considered the “original 9” SEER registries. In 1978, 10 predominantly black counties in rural Georgia were added, and American Indians in Arizona were added in 1980. New Orleans, Louisiana, (1974–1977, rejoined 2001), New Jersey (1979–1989, rejoined 2001), and Puerto Rico (1973–1989) were added prior to 1990. Metropolitan Los Angeles County, and four counties in the San Jose/Monterey area were added in 1992, to increase coverage of minority populations, especially individuals of Hispanic origin. In 2001, the SEER program expanded to include Kentucky and the remainder of California, and New Jersey and Louisiana rejoined the registry.
The SEER data are broadly representative of the US population, although there are some differences. Demographically, the population of patients recorded in the SEER database are more likely to be foreign born compared to the standard US 2000 population (17.3% vs 11.3%), and are more often urban inhabitants (88.2% vs 79%). There is also a higher proportion of the US Native Hawaiian/Pacific Islander (69.8%), Asian (53.3%), American Indian/Alaska Native (42.2%), and Hispanic populations (40.4%) covered compared to US white (23.4%) and US black (22.7%) populations in the SEER database. Nonetheless, due to its large size and long follow-up, the SEER program database continues to be studied as an accurate representation of the US cancer population as a whole.
Quality control is an important aspect of the SEER program. Registries are routinely audited for data accuracy, and a Data Quality Profile (DQP) is generated for each SEER registry. Individuals and registries that do well in reliability studies are identified and rewarded. In addition, the NCI SEER program performs regular education and training programs in coordination with the National Cancer Registrars Association annual meeting. Registrars are tested via Web-based reliability studies, and audits of high-volume facilities are performed, to ensure that case ascertainment is complete and timely. As a result of these efforts, the SEER program has become the standard for data quality among international cancer registries.
The SEER program regularly publishes national cancer statistics reviews and monographs as a part of its mission. Selected SEER publications can be accessed at http://seer.cancer.gov/publications, and include the Annual Report to the Nation on the Status of Cancer as well as Racial/Ethnic Patterns of Cancer in the United States. In addition to these statistical reviews, the SEER database is a rich resource for independent clinical researchers. With the increasing availability of easy-to-use statistical software and awareness of the SEER database, an increasing number of SEER-based reports have been published over the past decade (Figure 1), with over 400 peer-reviewed publications issued in 2008 alone.
Recorded in the SEER database are demographic variables and information describing the stage, extent of surgery, pathologic findings, whether or not radiation therapy was given, and the cause of death of patients with cancer (Table 1). A full list of variables can be found online. More detail has been recorded in recent years than in the past. For example, for all years (1973+), stage at diagnosis is broken up into five main categories—in situ, localized, regional, distant, or unstaged. However, since 2004, most primary cancer sites have additional TNM staging data based on the AJCC Cancer Staging Manual, 6th edition.