Advances in a wide array of scientific technologies have brought data of unprecedented volume and complexity into the oncology research space. These novel big data resources are applied across a variety of contexts—from health services research using data from insurance claims, cancer registries, and electronic health records, to deeper and broader genomic characterizations of disease. Several forms of big data show promise for improving our understanding of racial disparities in breast cancer, and for powering more intelligent and far-reaching interventions to close the racial gap in breast cancer survival. In this article we introduce several major types of big data used in breast cancer disparities research, highlight important findings to date, and discuss how big data may transform breast cancer disparities research in ways that lead to meaningful, lifesaving changes in breast cancer screening and treatment. We also discuss key challenges that may hinder progress in using big data for cancer disparities research and quality improvement.
For more than 4 decades, a survival gap has persisted between black and white women with breast cancer in the United States; age-adjusted mortality rates are 28 per 100,000 among black women and 20 per 100,000 among white women. Much of what we know about the root causes of this disparity, as well as possible solutions, comes from research using different types of data, ranging from high-dimensional genomic data to large population-based data sources and linkages with insurance claims records. We will provide a brief orientation on the research that has defined our understanding of breast cancer disparities to date, as well as promising emerging data sources and methods that may take us further in the quest to close the racial survival gap and provide better cancer care to vulnerable populations.
What Is the Meaning of Big Data for Oncology Research?
The term "big data" is used in many different industries. It is colloquial rather than scientific, with varying meaning based on the context in which the term is used. A general definition of big data centers on three defining features: size, complexity, and technology. That is, the data are of such enormous quantity and complexity that “manipulation and management present significant logistical challenges” using traditional data computing and technology methods. Big data are also frequently described using 4 to 10 key attributes known as the “Vs” of big data. The four most common Vs are volume, velocity, variety, and veracity. Depending on the field of research, each of these “V” attributes carries a different level of importance, and presents different challenges, to data users. Extensive research and technology developments are now emerging that help data users deal with each of the big Vs in their efforts to transform large and complex datasets into valuable and actionable information.
In cancer disparities research, big data resources can be powerful because the size (volume) and breadth (variety) of data captured means that, compared with clinical trial data or data from academic centers or consortia, the information may better represent vulnerable patient groups, such as minorities or elderly patients. Similarly, the tools and analytic methods developed for working with these data can model and measure how multiple complex factors interact to reveal specific disparities in care. For the purposes of this discussion, we will highlight the use of certain big data resources and analytic methods for cancer disparities research, including large linked administrative and cancer registry datasets, aggregated data from electronic health records (EHRs), and genomic data. A model for integrating these data sources to fully understand cancer disparities is illustrated in Figure 1.
Major Contributors to Racial Disparities in Breast Cancer
It is widely acknowledged that both social and biological factors contribute to the survival gap between black women and white women with breast cancer. The epidemiologic basis of racial disparities in breast cancer has been defined using data from the Surveillance, Epidemiology, and End Results (SEER) Program national network of cancer registries, as well as analyses of national cancer incidence and mortality data from the American Cancer Society. We know from these large, longitudinal, national data sources that black women, particularly those under 50 years of age, have a disproportionately large burden (measured as relative frequency) of the biologically aggressive triple-negative breast cancer subtype, as well as a more advanced disease stage at presentation.[6,7] Genomic sequencing data, which enable more precise molecular characterization of breast cancers, have been used in studies such as the Carolina Breast Cancer Study and The Cancer Genome Atlas project, to identify additional biological differences within clinically defined groups, including higher proportions of poor-prognosis molecular subtypes among young black women whose clinical markers indicate hormone receptor (HR)-positive and human epidermal growth factor receptor 2 (HER2)-negative disease. This variation within clinical subtypes may contribute to racial disparities in breast cancer outcome in these HR-positive HER2-negative patients, who would usually be expected to have a favorable prognosis.[8-10]
While unrecognized biological variation likely explains a portion of within-subtype variability in survival outcomes, we also have ample documentation, primarily from large cancer registry datasets linked to administrative claims data, that treatment disparities are prominent in breast cancer care and contribute to differences in disease outcomes. Beginning with surgical therapy, black women are less likely to receive any definitive surgery for early-stage disease, and less likely to receive morbidity-sparing sentinel lymph node biopsy when eligible. There is substantial evidence that these disparities are partially explained by the concentration of surgical treatment for black patients within lower-volume and lower-quality hospitals that are less likely to be integrated into research networks.[13-15] Adjuvant breast radiation therapy (RT) is more often delayed among black women; delays are linked to both site of care (with access to smaller surgical facilities and/or those with onsite RT available linked to higher odds of accessing RT, compared with access to large governmental facilities only) and breast cancer mortality. With respect to adjuvant chemotherapy, initiation of chemotherapy appears to be relatively equal by race. However, black women more often have delays between surgery and chemotherapy, and may discontinue chemotherapy and biologic therapy prematurely; both treatment patterns are demonstrably related to survival decrements.[18-20] Black women with HER2-overexpressing breast cancer receive trastuzumab, a highly effective but costly targeted therapy, at dramatically lower rates than white women. Among eligible patients, black women initiate adjuvant hormonal therapy at lower rates than white women and have more problems with adherence; this disparity appears to be concentrated among younger patients and patients treated with chemotherapy. With regard to genomic testing, black women are less likely to receive guideline-concordant gene expression profile testing to help predict the benefit from chemotherapy in those with HR-positive disease.[23,24]
1. DeSantis CE, Fedewa SA, Goding Sauer A, et al. Breast cancer statistics, 2015: convergence of incidence rates between black and white women. CA Cancer J Clin. 2016;66:31-42.
2. Ward JS, Barker A. Undefined by data: a survey of big data definitions 2013. May 30, 2017. https://arxiv.org/abs/1309.5821. Accessed July 13, 2017.
3. Kayyali B, Knott D, Van Kuiken S. The big-data revolution in US health care: accelerating value and innovation: McKinsey & Company; 2013. http://www.mckinsey.com/industries/healthcare-systems-and-services/our-insights/the-big-data-revolution-in-us-health-care. Accessed July 13, 2017.
4. De Mauro A, Greco M, Grimaldi M. A formal definition of big data based on its essential features. Library Rev. 2016;65:122-35.
5. Daly B, Olopade OI. A perfect storm: how tumor biology, genomics, and health care delivery patterns collide to create a racial survival disparity in breast cancer and proposed interventions for change. CA Cancer J Clin. 2015;65:221-38.
6. Carey LA, Perou CM, Livasy CA, et al. Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. JAMA. 2006;295:2492-502.
7. Howlader N, Altekruse SF, Li CI, et al. US incidence of breast cancer subtypes defined by joint hormone receptor and HER2 status. J Natl Cancer Inst. 2014;106.
8. O’Brien KM, Cole SR, Tse CK, et al. Intrinsic breast tumor subtypes, race, and long-term survival in the Carolina Breast Cancer Study. Clin Cancer Res. 2010;16:6100-10.
9. Troester MA, Sun X, Allott EH, et al. Racial differences in PAM50 subtypes in the Carolina Breast Cancer Study. J Natl Cancer Inst. 2017 Aug 1. [Epub ahead of print]
10. Huo D, Hu H, Rhie SK, et al. Comparison of breast cancer molecular features and survival by African and European ancestry in The Cancer Genome Atlas. JAMA Oncol. 2017 May 4. [Epub ahead of print]
11. Freedman RA, He Y, Winer EP, et al. Trends in racial and age disparities in definitive local therapy of early-stage breast cancer. J Clin Oncol. 2009;27:713-9.
12. Reeder-Hayes KE, Bainbridge J, Meyer AM, et al. Race and age disparities in receipt of sentinel lymph node biopsy for early-stage breast cancer. Breast Cancer Res Treat. 2011;128:863-71.
13. Carpenter WR, Reeder-Hayes K, Bainbridge J, et al. The role of organizational affiliations and research networks in the diffusion of breast cancer treatment innovation. Med Care. 2011;49:172-9.
14. Meyer AM, Reeder-Hayes KE, Liu H, et al. Differential receipt of sentinel lymph node biopsy within practice-based research networks. Med Care. 2013;51:812-8.
15. Freedman RA, Kouri EM, West DW, et al. Racial/ethnic differences in patients’ selection of surgeons and hospitals for breast cancer surgery. JAMA Oncol. 2015;1:222-30.
16. Wheeler SB, Carpenter WR, Peppercorn J, et al. Structural/organizational characteristics of health services partly explain racial variation in timeliness of radiation therapy among elderly breast cancer patients. Breast Cancer Res Treat. 2012;133:333-45.
17. Hershman DL, Wang X, McBride R, et al. Delay in initiating adjuvant radiotherapy following breast conservation surgery and its impact on survival. Int J Radiat Oncol Biol Phys. 2006;65:1353-60.
18. Fedewa SA, Ward EM, Stewart AK, et al. Delays in adjuvant chemotherapy treatment among patients with breast cancer are more likely in African American and Hispanic populations: a national cohort study 2004-2006. J Clin Oncol. 2010;28:4135-41.
19. Hershman D, McBride R, Jacobson JS, et al. Racial disparities in treatment and survival among women with early-stage breast cancer. J Clin Oncol. 2005;23:6639-46.
20. Freedman RA, Hughes ME, Ottesen RA, et al. Use of adjuvant trastuzumab in women with human epidermal growth factor receptor 2 (HER2)-positive breast cancer by race/ethnicity and education within the National Comprehensive Cancer Network. Cancer. 2013;119:839-46.
21. Reeder-Hayes K, Peacock Hinton S, Meng K, et al. Disparities in use of human epidermal growth hormone receptor 2-targeted therapy for early-stage breast cancer. J Clin Oncol. 2016;34:2003-9.
22. Roberts MC, Wheeler SB, Reeder-Hayes K. Racial/ethnic and socioeconomic disparities in endocrine therapy adherence in breast cancer: a systematic review. Am J Public Health. 2015;105(suppl 3):e4-e15.
23. Roberts MC, Weinberger M, Dusetzina SB, et al. Racial variation in the uptake of Oncotype DX testing for early-stage breast cancer. J Clin Oncol. 2016;34:130-8.
24. Davis BA, Aminawung JA, Abu-Khalaf MM, et al. Racial and ethnic disparities in Oncotype DX test receipt in a statewide population-based study. J Natl Compr Canc Netw. 2017;15:346-54.
25. Meyer AM, Olshan AF, Green L, et al. Big data for population-based cancer research: the integrated cancer information and surveillance system. N C Med J. 2014;75:265-9.
26. Truven Health Analytics; IBM Watson Health. Putting research data into your hands with the MarketScan databases. 2017 [cited May 25, 2017]. http://truvenhealth.com/markets/life-sciences/products/data-tools/marketscan-databases. Accessed July 11, 2017.
27. Kent EE, Malinoff R, Rozjabek HM, et al. Revisiting the Surveillance Epidemiology and End Results Cancer Registry and Medicare Health Outcomes Survey (SEER-MHOS) linked data resource for patient-reported outcomes research in older adults with cancer. J Am Geriatr Soc. 2016;64:186-92.
28. Chubak J, Yu O, Pocobelli G, et al. Administrative data algorithms to identify second breast cancer events following early-stage invasive breast cancer. J Natl Cancer Inst. 2012;104:931-40.
29. Chawla N, Yabroff KR, Mariotto A, et al. Limited validity of diagnosis codes in Medicare claims for identifying cancer metastases and inferring stage. Ann Epidemiol. 2014;24:666-72.
30. Bastien RR, Rodriguez-Lescure A, Ebbert MT, et al. PAM50 breast cancer subtyping by RT-qPCR and concordance with standard clinical molecular markers. BMC Med Genomics. 2012;5:44.
31. Allott EH, Cohen SM, Geradts J, et al. Performance of three-biomarker immunohistochemistry for intrinsic breast cancer subtyping in the AMBER consortium. Cancer Epidemiol Biomarkers Prev. 2016;25:470-8.
32. Beck AH, Sangoi AR, Leung S, et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci Transl Med. 2011;3:108ra13.
33. Hoadley KA, Yau C, Wolf DM, et al. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell. 2014;158:929-44.
34. Keenan T, Moy B, Mroz EA, et al. Comparison of the genomic landscape between primary breast cancer in African American versus white women and the association of racial differences with tumor recurrence. J Clin Oncol. 2015;33:3621-7.
35. Alexandrov LB, Nik-Zainal S, Wedge DC, et al. Signatures of mutational processes in human cancer. Nature. 2013;500:415-21.
36. Nik-Zainal S, Alexandrov LB, Wedge DC, et al. Mutational processes molding the genomes of 21 breast cancers. Cell. 2012;149:979-93.
37. Shah A, Stewart AK, Kolacevski A, et al. Building a rapid learning health care system for oncology: why CancerLinQ collects identifiable health information to achieve its vision. J Clin Oncol. 2016;34:756-63.
38. Berger ML, Curtis MD, Smith G, et al. Opportunities and challenges in leveraging electronic health record data in oncology. Future Oncol. 2016;12:1261-74.
39. Penn DC, Chang Y, Meyer AM, et al. Provider-based research networks may improve early access to innovative colon cancer treatment for African Americans treated in the community. Cancer. 2015;121:93-101.
40. Carpenter WR, Meyer AM, Wu Y, et al. Translating research into practice: the role of provider-based research networks in the diffusion of an evidence-based colon cancer treatment innovation. Med Care. 2012;50:737-48.
41. Abernethy AP, Etheredge LM, Ganz PA, et al. Rapid-learning system for cancer care.
J Clin Oncol. 2010;28:4268-74.
42. Ross I, Womble P, Ye J, et al. MUSIC: patterns of care in the radiographic staging of men with newly diagnosed low risk prostate cancer. J Urol. 2015;193:1159-62.
43. Hurley P, Dhir A, Gao Y, et al. A statewide intervention improves appropriate imaging in localized prostate cancer. J Urol. 2017;197:1222-8.
44. Wheeler SB, Kuo TM, Goyal RK, et al. Regional variation in colorectal cancer testing and geographic availability of care in a publicly insured population. Health Place. 2014;29:114-23.
45. Wheeler SB, Kuo TM, Meyer AM, et al. Multilevel predictors of colorectal cancer testing modality among publicly and privately insured people turning 50. Prev Med Reports. 2017;6:9-16.
46. Schaal JC, Lightfoot AF, Black KZ, et al. Community-guided focus group analysis to examine cancer disparities. Progress in community health partnerships: research, education, and action. Prog Community Health Partnersh. 2016;10:159-67.