|Articles|July 23, 2018

How to More Effectively Leverage Cancer Genomics Databases as Clinical Decision–Making Tools

Oncologist Dr. Vivek Subbiah and computer scientist Dr. Jason Roszik, both from MD Anderson, discuss optimizing “omics” databases to inform cancer patient care.

Cancer Network spoke with oncologist Vivek Subbiah, MD, and mathematician and computer scientist Jason Roszik, PhD, MBA, who highlighted important current genetics/genomics and proteomics databases in the United States, and how they can be improved to better inform real-world treatment of patients with cancer.

Dr. Subbiah is an assistant professor in the Department of Investigational Cancer Therapeutics and Associate Medical Director for Clinical Center for Targeted Therapy at The University of Texas MD Anderson Cancer Center, where he studies precision oncology strategies including molecularly targeted therapies, immunotherapies, and radiopharmaceuticals for patients with cancer.

Dr. Roszik is an assistant professor in the departments of Genomic Medicine and Melanoma Medical Oncology at MD Anderson Cancer Center. He is developing computational algorithms and pipelines, and visualization tools for integration and analysis of large data sets to identify novel targets and predictive signatures for targeted therapies and immunotherapies. He participated in the international project known as The Cancer Genome Atlas (TCGA).

-Interviewed by Bryant Furlow

Cancer Network: How many public cancer genomics databases exist currently, and how are they used to inform clinical decision making?

Dr. Subbiah: The one database which is indirectly used in clinical decision making is the COSMIC database. Most of the databases provide objective frequency of aberrations.

Dr. Roszik: Most of these databases that contain genomic information on thousands of tumor samples are from major projects like the TCGA, COSMIC (the Wellcome Sanger Institute’s Catalogue of Somatic Mutations in Cancer), CCLE (Cancer Cell Line Encyclopedia), AACR Project GENIE (Genomics Evidence Neoplasia Information Exchange), and also NCI GDC (the National Cancer Institute’s Genomic Data Commons) and NCBI dbGaP that contain many additional large datasets.

Cancer Network: When you say that COSMIC is used indirectly in clinical decision making, what do you mean by that? Are its data accessed by a decision tool or application?

Dr. Subbiah: COSMIC has its own website where genes and mutations can be queried. In addition to this, decision-supporting tools that offer mutation analysis often use the COSMIC database and provide that information if a variant was found in COSMIC. COSMIC also contains information about whether the mutation is known or predicted to be pathogenic or not. This is very useful indirectly in clinical practice.

Cancer Network: You’ve recently argued that the “siloing” of cancer genomics and proteomics databases has hampered real-world implementation of precision oncology. What do you mean by that?

Dr. Subbiah: First of all, I would like to applaud the various stakeholders for getting together and putting up these databases that Dr. Roszik mentioned. One of the major challenges now is that we do not have the ability to track outcomes of these patients with molecularly targeted therapies. This is probably because of the lack of data structure for this. Also, “siloing” prevents us from including additional data types in analyses.

Cancer Network: Do you believe that the clinical relevance of “variants of unknown significance” (“VUS”) could be clarified if data from different databases were pooled for analysis?

Dr. Roszik: Connecting many databases makes it possible to include additional variables in analyses. In addition, there are databases and also scientific publications that contain the same type of data for different patient cohorts. Sample size is often a limiting factor when studying variants, especially in the case of rare cancers, or when only a very small subset of a common cancer is affected by a variant. Increasing the number of samples with genomics information would definitely help identify whether a variant is pathogenic or not.

Dr. Subbiah: Great question. Let’s discuss what constitutes a VUS. If genetic testing results are neither positive nor negative, then they fall in to the VUS category. Variant classification and attribution relies heavily on building levels of evidence. As we build evidence, the VUS category can be amended to “pathogenic” or “benign.” As I mentioned in my previous answer, we do not have annotated clinical information on molecularly targeted therapies in these public databases. We can, at best, know the prevalence of these VUS’s from these databases-but not the conversion from “unknown” to “known.” The COSMIC database that I mentioned before, from Sanger institute in the UK, is one that updates these data.

Cancer Network: Private companies that provide clinical sequencing services consider their databases to be business-proprietary information. Do these companies share their insights into the clinical relevance of VUS? If not, could or should they do so, in your opinion?

Dr. Subbiah: Although the clinical sequencing companies have access to vast databases, they are not clinically annotated. They will need to work with academic centers and other cancer network centers to clinically annotate their databases and outcomes to matched therapies. It would be terrific if there were a private-academia-NCI partnership to clinically annotate all the databases with respect to the outcomes of patients on targeted therapy, immunotherapy, or standard-of-care therapy.

Cancer Network: How might data from public, nonproprietary isolated databases be integrated or connected to create larger databases with which the significance of particular gene variants or gene expression signatures, for example, might be clarified? Have any such integration efforts been made?

Dr. Roszik: There are a few projects in development, and some of them already offer analysis tools as well or can be accessed through cBioPortal. For example, the goal of AACR Project GENIE is to provide larger data sets and statistical power to improve clinical decision making. NCI GDC already contains clinical sequencing data from Foundation Medicine Inc., for integrated analyses with TCGA and TARGET [The Therapeutically Applicable Research to Generate Effective Treatments] data. This is a good example [of how] a private company helped to more than double the number of cases in a database.

Cancer Network: How big of a challenge would it be to integrate information the large public databases? Do these databases utilize the same data fields and categorize or organize data in easily integrated ways?

Dr. Subbiah: It is a huge challenge to integrate information that is already there in public repositories. The challenge mainly arises from the heterogeneity of the databases. Recognizing this, several stakeholders have come together for defining basic common elements, and [setting] standards to allow data pooling for maximum analytical power for the future. The Center for Medical Technology Policy (CMTP) has identified 49 elements as a core set of data elements essential to understanding the clinical utility of molecularly targeted therapies in oncology. Hopefully this can facilitate future precision oncology efforts.

Cancer Network: What about patient privacy? Do the public databases use patient-specific identifiers? And if they do, is there a concern that pooling data from different databases might create a “pseudo-replication” problem with the same tumor sample being counted more than once, creating the illusion of a stronger association between gene variants and tumor behavior?

Dr. Roszik: Public databases are designed to prevent identification of patients. Unfortunately, when using public, de-identified data, it is difficult to avoid duplications that can happen. The same sample might be used in multiple studies, or multiple versions of a database might be re-analyzed and pooled with other datasets by various researchers. Or a patient may have genomic data from private companies and also participate in studies at cancer centers. A solution could be if the data sources worked together to prevent duplication and protect patient privacy as well.

Cancer Network: Are there online tools or smart-device applications available that allow clinicians to pull what’s known about a given variant or gene from multiple public databases?

Dr. Subbiah: There are several that have been developed. One that I use, developed at MD Anderson Cancer Center, is called “Personalized Cancer Therapy.” This website compiles the available scientific knowledge on cancer-associated abnormal genes and gene products and their implications for cancer therapy. Another useful one is “My Cancer Genome,” which a precision cancer medicine knowledge resource for physicians, patients, caregivers, and researchers. My Cancer Genome provides up-to-date information on what mutations make cancers grow, as well as related therapeutic implications, including available clinical trials.

Dr. Roszik: In addition, as I mentioned earlier, cBioPortal provides tools to analyze and visualize data from TCGA and many other projects, and the GDC Data Portal provides tools for integrated analysis of data from TCGA, TARGET, and Foundation Medicine.

Cancer Network: In what other ways can genomic data be better leveraged for patients?

Dr. Roszik: Many new big data sets are being created and published these days. It would be useful to have a common format, for example for variants, or gene-expression data, to make integration easier. Furthermore, a few scientific journals already require that datasets need to be made available upon publication. However, authors often publish them in a way that makes it difficult or impossible to use the data. I think it would be important for all journals to require depositing the raw data to a safe place, for example NCBI dbGaP, and also to publish all the de-identified, processed genomics data in a common, appropriate format-especially if the data generation is paid for by the NIH.

Dr. Subbiah: As I said earlier, it would be terrific if there were a private-academia-NCI partnership to clinically annotate the all the databases with respect to patient outcomes. In addition, this registry should also add data from real-life patients, who can also contribute a lot to data generation. Ultimately all stakeholders should form a national registry-not just from academic centers but also from community practices with input from everyone, including patients. It may be hypothetical or a dream, but anything is possible when all of us come together for a common cause to end cancer.

Stay up to date on recent advances in the multidisciplinary approach to cancer.

Subscribe Now!

How to More Effectively Leverage Cancer Genomics Databases as Clinical Decision–Making Tools

Newsletter

Related Content

Outlining Advances in AI For Breast Cancer Screening/Radiomics

2026 Tandem Meetings: What’s the Latest Research in Multiple Myeloma?

Tandem Meetings Recap: T-Cell Efficacy and Safety Updates in Lymphoma

Theranostics in Radiation Oncology: What is it and Why is it Important?

Allogeneic Transplantation Yields Long-Term Survival in ALL Population

Latest CME

Community Oncology Connections™: Beyond Primary End Points—Digging Into Randomized and Real-World Data to Guide Challenging Treatment Decisions in HR+/HER2− Metastatic Breast Cancer | Washington State Medical Oncology Society

A Breath of Strength: Managing Cancer Associated LEMS and Lung Cancer as One

Striking the Right Nerve: Managing Cancer Associated LEMS in Lung Cancer Patients

Show Me the Data™: Bridging Clinical Gaps Along the Continuum From Resectable, Early Stage to Advanced Gastric/Gastroesophageal Junction Cancers

Community Oncology Connections™: Beyond Primary End Points—Digging Into Randomized and Real-World Data to Guide Challenging Treatment Decisions in HR+/HER2− Metastatic Breast Cancer | Kentucky Society of Clinical Oncology

Community Oncology Connections™: Beyond Primary End Points—Digging Into Randomized and Real-World Data to Guide Challenging Treatment Decisions in HR+/HER2− Metastatic Breast Cancer | Indiana Oncology Society

19th Annual New York GU Cancers Congress™

Medical Crossfire®: Expert Interpretations of the Latest Data in CLL Management – Understanding the Impact of Optimal Treatment Selection on Patient Outcomes

Virtual Testing Board: Digging Deeper on Your Testing Reports to Elevate Patient Outcomes in Advanced Non–Small Cell Lung Cancer

Medical Crossfire® – From Diagnostic Dilemmas to Potential Treatment Breakthroughs: Exploring Novel Targets for Extrapulmonary Neuroendocrine Carcinomas

Addressing Unmet Needs in HER2+ Metastatic BTC

Community Practice Connections™: Tailored Treatment Approaches for Older Patients With Advanced HR+/HER2– Breast Cancer

Community Practice Connections™: Empowering Interventional Radiologists in the Emerging Era of Oncolytic Immunotherapies for Melanoma

GI Tumor Board—Applying Recent Advances in Biomarker Testing and Treatment in Metastatic Colorectal Cancer

Medical Crossfire®: Harnessing the Power of Modern Therapies in Newly Diagnosed Multiple Myeloma

Medical Crossfire®: Expert Perspectives on Targeting c-Met Overexpression and 𝘔𝘌𝘛 Genomic Alterations in NSCLC – Unveiling the Complexities of 𝘔𝘌𝘛 Dysregulation

PER Tumor Board®: Applying Recent Advances to Transform the Treatment Paradigm in SCLC—Expert Perspectives on New Approvals and Emerging Strategies

Cases & Conversations™: Transforming AML Care—Precision Strategies, Evolving Therapies, and Clinical Insights

Medical Crossfire®: Precision Medicine in Glioma Treatment — Integration of Molecular Profiling to Inform Targeted Therapies

Medical Crossfire®: Integrating Next-Generation Endocrine Targeting Therapies to Improve Outcomes for Patients With HR+/HER2- Breast Cancer

Cases and Conversations™: Sorting Through the Expanding Treatment Options for Patients with Relapsed/Refractory Multiple Myeloma

Medical Crossfire®: Improving Patient Outcomes in Myeloproliferative Neoplasms With Novel Therapeutic Approaches

Community Oncology Connections™: Optimizing SCLC Treatment Strategies and Managing Adverse Events Across Disease Stages | South Carolina

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | Kansas

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | Wyoming and Montana

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | New Mexico

Community Oncology Connections™: Optimizing SCLC Treatment Strategies and Managing Adverse Events Across Disease Stages | North Carolina

Live Tumor Board: Squamous Cell Carcinoma of the Head & Neck – Post-CRT Decisions in the Locally Advanced Setting

Community Practice Connections™: Optimizing Treatment Outcomes and Preserving Fertility in Premenopausal HR+ Breast Cancer

From Bench to Bedside: Paradigm Shifts in HER2+ Metastatic BTC Treatment

Proactive Adverse Event Management for HER2+ BTC Treatments

A Case-Guided Discussion on Managing Immune Thrombocytopenic Purpura (ITP)

Tumor Board: Expert Insights on Managing Classical 𝘌𝘎𝘍𝘙 Mutations, 𝘌𝘎𝘍𝘙 Exon 20 Insertions, and Atypical 𝘌𝘎𝘍𝘙 Mutations in Metastatic NSCLC

Evolving Treatment Strategies in Pancreatic Cancer: Current Standards, Emerging Targets, and the Role of Molecular Testing

Breast Cancer Tumor Board: Targeting TROP2 – Innovations in Triple-Negative Breast Cancer Treatment

Trending on CancerNetwork

Modifiable Risk Factors Suggest Potential for Improving Cancer Prevention

Exploring The Impact of ADTs on Cardiac Risk in Prostate Cancer Treatment

Dato-DXd Receives Priority Review in Unresectable/Metastatic TNBC

Barriers to CAR T-Cell Referral and Center Access in Multiple Myeloma

2026 Tandem Meetings: What’s the Latest Research in Multiple Myeloma?