BUFFALO, NYFormer Roswell Park Cancer Institute (RPCI) faculty member J. Craig Venter, PhD, founder and president of Celera Genomics, returned to the Buffalo-based comprehensive cancer center to present the Institute’s Cori Lecture (see box).
Dr. Venter, a member of Roswell Park’s Immunology Department from 1982 to 1984, led the team at Celera in sequencing the human genome. The work that led him to this work, Dr. Venter recalled, actually began during his time at RPCI.
"My wife Claire and I were here in Buffalo," he said, "trying to isolate a single protein. It took a decade to get enough purified protein, and then a tiny bit of sequence before I moved to the National Institutes of Health. There we proceeded to teach ourselves molecular biology over the course of a year as we cloned and sequenced that protein. It seems remarkably pathetic now how long it took us."
Still, at a time when the idea of mapping the entire human genome in a timely, concerted, coordinated fashion would have been labeled Sisyphean folly, Dr. Venter plodded on. He and his colleagues, Dr. Venter said, took a "shotgun" approach to the task at hand, utilizing every available resource in hopes of finding the elusive target.
"The biggest challenge was dealing with the data," he said. "We thought that a trillion calculations per second would be sufficient; it took 20,000 CPU hours to assemble the human genome," he said.
Researchers at Celera collected DNA from 21 donors. Karyotype analysis was performed to verify that each participant had a complete set of chromosomes from which the team established cell lines. Five individuals were selected for sequencing.
"In the final decision, we chose to use the DNA from three females and two males. It took us 9 months to sequence the genome, and by the time we had finished, we had covered the genome 39 times in these individuals," Dr. Venter said.
On June 26, 2000, at the White House, Dr. Venter announced Celera’s complete assembly of the human genome, and Dr. Francis Collins, director of the NIH National Human Genome Research Institute, announced the Human Genome Project’s completion of its working draft of the human genome.
"Most people do not know that we were still assembling our data the day before we announced that we were done," Dr. Venter said. "Fortunately, the calculations were finished before the announcement." Both groups published their sequence of the human genome in Feb. 2001 (Celera in Science and the Human Genome Project in Nature).
Dr. Venter said that the most surprising aspect of the sequencing project was the less than expected number of human genes: roughly 26,000 to 30,000. "People wanted there to be a large number of genes, in order to account for every unique human trait and condition," he said.
People also wanted differences between humans and other species. "Yet we now know that there are only about 300 genes in the human genome that don’t have a counterpart in the mouse genome," he said. "We believe that basically all the main genomes are the same material, just repackaged slightly."
Dr. Venter believes that the chromosome maps and genomes are essentially the historical record of a species. "I think it’s going to be really a fascinating exercise as each new genome gets sequenced and we can go back through and create these same maps and understand the evolutionary events that actually led to the existence of us and other species."
From a clinical perspective, this research still leaves many questions, he said. It is still not possible to predict disease development in most people. The next step, sequencing proteins, may offer more information to physicians and their patients. "Having the complete genome was an essential step for proteomics [the study of human proteins] to go forward because when we got these fragments and compared them back with the database, the sequence was not there and we could not interpret the data," Dr. Venter said. "Now we can do that." He said that protein sequencing techniques are being applied initially to breast, colon, pancreatic, and lung cancer. "We think protein sequences are going to be the key link between the genome and understanding biology and medicine," he said.