Mapping Out Cancer Progression With PiCnIc


Professor Mishra and his colleagues recently developed a program called Pipeline for Cancer Inference or PiCnIc for short that analyzes patient tumor sequencing data to create potential ways that the tumor may evolve and progress that is patient-personalized.

Bud Mishra, PhD, MS

Today we are speaking with Bud Mishra, PhD, professor at New York University's Courant Institute of Mathematical Sciences in New York City. Professor Mishra and his colleagues recently developed a program called Pipeline for Cancer Inference or PiCnIc for short that analyzes patient tumor sequencing data to create potential ways that the tumor may evolve and progress that is patient-personalized. The description of the program was recently published in the scientific journal PNAS.

-Interviewed by Anna Azvolinsky, PhD


OncoTherapy Network: Can you describe PiCnIC? What is the individual patient information that’s needed and what does the program do with this information?

Dr. Mishra: So, PiCnIC is a pipeline to generate a cancer progression model from a large amount of patient genomic data. So, the kinds of data we are looking at are things that are available on TCGA [The Cancer Genome Atlas] or what Joe Biden’s Cancer Moonshot program will generate. So let me explain the underlying process.

One way to think about cancer is a wicked problem. So what is a wicked problem? It’s a problem that is well-defined, but when you solve it, the problem itself changes. So we started thinking about cancer as a viral disease, but then we modified our view and thought about cancer as a disease of the genome. So we thought of cancer as related to a set of oncogenes and tumor suppressor genes, but then lately we have evolved into thinking about cancer as a disease of somatic evolution. So we went from thinking about cancer as one cell and a monoclonal tumor to a polyclonal, heterogeneous and temporal thing. We thought about cancer as having cell autonomous processes, but now we think of it as something that is driven by the population and we have to think about many driver mutations that are choreographed within a sea of passenger mutations.

So when I have all of this genomic data, I want to solve the problem of which mutations are coming at what point and which mutations are driver ones, but the data science is very difficult because [the cancer] is nonstationary, it has many mutation variables because there are lots and lots of passenger mutations that are not causal and there are multiple hypotheses that need to be tested. So, to understand this, let’s think of the beginning of cancer.

The first thing a cancer may do is acquire a mutation in an EGFR gene or RAS or something else. And this mutation allows it to grow but as it grows, it runs out of metabolites and cannot really grow because the cell cycle checkpoints and the cell cycle will [stop the growth]. So the next kinds of mutations that will let the tumor escape the checkpoints and that is something like a cyclin-dependent kinase mutation. So then we say, in terms of causality, that the EGFR mutation caused the cyclin-dependent kinase mutation. And then once you have a cyclin-dependent kinase mutation, you have dysplasia, when cells are no longer well-shaped, but they keep on dividing. The cells keep proliferating, but some are deprived of oxygen so you have hypoxia and that would lead the cells that have VEGF mutations to allow angiogenesis and cells that are able to do a mesenchymal transformation, cells that can do anaerobic glycolysis through the Warburg Effect. So what we do is to describe this process of one mutation causing another mutation. So we say the EGFR mutation caused the cyclin-dependent kinase mutation, which caused the VEGF mutation, etc. So the goal is to create a picture from the natural experiments that are happening in the tumor using the patients’ genomic data. That is what PiCnIc is doing; it uses theory of probabilistic causation. And there are two ingredients; one is probability raising and the second temporal priority. These are very simple ideas and can be described using both probability and time using a logic called probabilistic computational tree logic. So that gives an algorithm to take patient data and carry out this analysis to find how the driver mutations are choreographed in the patient data.

OncoTherapy Network: What could this type of data be useful for?

Dr. Mishra:  So, mainly, we are using it to make a map of cancer. So one of my colleagues, Judith Klein-Seetharaman, describes this as a time machine that allows you to go forward and predict what will happen and go backwards to see what the earlier events that have happened in a certain order. So one way to use it is to figure out what the earlier mutations are that are signs of cancer. So in the near future, we may be able to do cell-free DNA assays to see if those mutations already exist in the cell-free DNA. And then we can build on top of this, using Fisher kernels and things like that (6:35) to translate genotype into phenotype. So I would take primary tumor data of multiple cells and multiple mutations and translate that into time intervals-time to drug resistance or metastasis or survival time-that would lead to thinking about which ones are more imminent and what therapies to choose. But essentially, it's a mechanism to think about drug discovery and causal structures, where the drug targets are and so on.

OncoTherapy Network: Is there more work that’s needed for this to be used by clinicians? It sounds like this is still in the research realm.

Dr. Mishra: This still needs a lot of work. One thing is that we will get better and more longitudinal data. For example, we will probably be monitoring patients for circulating tumor cells, and then we will have much better data and more patients, and if we can stratify patients into different subtypes, all of these things will improve the algorithm. Just to orient you, what we are using for PiCnIc is called type-level causality. And another version which is less statistical and applies to individual patients, those are token-level causality. So the next step would be to go from the type-level causality to token-level causality. The idea is to use this as causal structures in all patients and how to interpret one patient’s breast cancer, and the kind of causal structures embedded in that patient’s tumor. But this is not a complete system; there is still work to be done.

OncoTherapy Network: Thank you so much for joining us today, professor Mishra.

Dr. Mishra: Thank you so much!


Related Videos
pre- and post-genetic testing online education
Related Content