Completion of the human genome
project and the development
of high-throughput gene
expression analysis has ushered in the
era of "omics," bringing the promise of
"molecular medicine" closer to reality.
Genomics refers to the study of the
human genome (ie, DNA sequences),
and functional genomics is the study
of gene expression (ie, messenger
RNA [mRNA] levels). Because of the
availability of robust technologies for
DNA and mRNA analysis, most translational
research studies of human
cancer have focused on these two information
reservoirs. However, although
these studies can provide
valuable information, adapting their
use to clinical practice has not been
easy due to biologic and technologic
limitations.
Proteomics is the characterization
of biologic processes by quantitative
and qualitative assessment of protein
expression patterns. Unlike genomic
studies, proteomics provides a dynamic
picture of normal and abnormal
cellular physiology. Although mRNA
expression studies also provide dynamic
information, they provide an
indirect measure of protein expression
(as mRNA merely directs protein expression).
Proteins are responsible
for all controlled biologic functions
and are the true determinants of
the malignant phenotype. Because
glycosylation, phosphorylation, cellular
trafficking, and degradation can
affect protein function, important information
may be missed by mRNA
expression studies. Proteomic investigations
can detect these posttranslation
modifications and will provide
additional information that is complementary
to studies of gene expression.
In light of the fact that virtually all
US Food and Drug Administration-
approved diagnostic/prognostic tests
and cancer therapies are protein-based,
findings from proteomic studies will
be easier to develop into clinically useful
tools than are findings from genetic-
based studies. Early proteomic
studies tended to detect changes in
highly abundant proteins, but as technology
has progressed, the ability to
detect and quantify changes in less
abundant proteins has become easier.
Typically, the cancerous tissue is
the most fertile source of relevant molecular
information. For many human
cancers, however, adequate tumor material
is unobtainable without invasive
procedures, and thus, is not available
for molecular analysis. Molecular
events within cancerous tissue (even
in the case of a solid tumor) may be
reflected by changes in the proteome
of circulating body fluids, and analysis
of these body fluids may provide
important clinical information.
The most appropriate tissue or body
fluid source to study depends on the
type of cancer and on the defined purpose
of the investigation. Traditionally,
the goal of most biomarker-based
proteomic discovery efforts is to identify
proteins that change as a cause or
consequence of the disease process
and then to develop enzyme-linked
immunosorbent assay-based clinical
tests to directly measure the analytes
in question. Recently, however, efforts
have focused on the development of
proteomic technologies such as mass
spectroscopy and protein arrays as the
clinical test platform of choice.[1,2]
Improvements in proteomic and
bioinformatics technologies provide
translational researchers with new opportunities
to develop better methods
of diagnosing and determining prognosis
for human cancer. These new
technologies should improve the
health of cancer patients and people
at risk for developing cancer. This
article reviews these technologic advances
and their potential clinical
applications.
Laser Capture Microdissection
Established cultured cell lines
are commonly used for molecular
studies of human cancer. Although
these studies can provide valuable information
regarding cancer biology,
some of the findings may not be applicable
to patients because the behavior
of cancer cells grown in culture
may not be representative of the in
vivo situation. In fact, studies comparing
protein expression patterns of
prostate and bladder cancer from tumor
tissue and patient-matched cultured
cells have demonstrated profound
differences.[3,4]
One of the major limitations of the
direct study of molecular changes in
clinical specimens has been the difficulty
of separating malignant cells of
interest from stroma, inflammatory
cells, and benign epithelial cells. To
overcome this investigative hurdle,
various microdissection techniques
have been developed for procuring
pure populations of cells from human
tissue sections.
Laser capture microdissection is a
relatively new technique that allows
researchers to visualize a tissue section
via light microscopy and procure the
desired cells by activating a 7.5- to
30-μm diameter infrared laser beam
to "weld" the tissue to a plastic cap. Intact
DNA, RNA, and protein can then
be extracted from the "welded" tissue
and analyzed by conventional methods.[
5,6] Binding properties of proteins
are preserved, and laser capture
microdissection can be used to study
differences in protein-protein interaction
within different tissue types.
For example, prostate-specific antigen
(PSA) recovered from cells
procured via this technique retains the
ability to bind inhibitors such as alpha-
1-antichymotrypsin. Studies utilizing
laser capture microdissection
have demonstrated that PSA exists as
an unbound enzyme in both benign
and malignant prostate epithelium and
that this "free" form of PSA can bind
to alpha-1-antichymotrypsin in either
setting.[7]
Discovery-Based Proteomics
Two-Dimensional Gel
Electrophoresis
Traditional proteomic studies have
relied on multiplexed two-dimensional
polyacrylamide gel electrophoresis
(2D-PAGE) to compare protein expression
patterns from different tissues
or cell lines. The first dimension separates
proteins by pH (isoelectric focusing),
and the second dimension, by
molecular weight (SDS-PAGE). Although
2D-PAGE has been available
for several decades, advances in this
technology have dramatically improved
its sensitivity, spot resolution,
and reproducibility. The use of fluorescent-
based dyes (such as SYPRO
Red) has improved the dynamic range
and sensitivity of protein detection,
while the development of immobilized
pH gradients and image analysis software
has improved the reproducibility
of such tests.
The primary use of 2D-PAGE is to
facilitate identification of differentially
expressed proteins or protein isoforms.
Protein identification can be accomplished
by direct sequencing or by
comparing spot patterns to "standard"
gels in which all spots have been
microsequenced. Improvements in
matrix-assisted laser desorption/ionization
time-of-flight (MALDI-TOF)
mass spectrometry and nanoelectrospray
technology along with the availability
of searchable protein sequence
databases have greatly enhanced
protein identification. With current
technology, a spot on a 2D gel containing
several hundred femtomoles of
protein can be identified and fully
characterized.[8-10]
Multiple investigators have successfully
used 2D-PAGE to identify
protein expression changes associated
with a wide variety of human cancers.[
3,11-16] For example, 2D-PAGE
analysis of procured patient-matched
benign and cancerous cells showed
that annexin I was downregulated in
both prostate and esophageal cancers
(Figure 1).[11,17] These findings were
subsequently confirmed by immunohistochemical
studies of large patient
study sets.[18,19] Another study utilizing
2D-PAGE found that Rho Gprotein
dissociation inhibitor and glyoxalase
I are overexpressed in invasive
ovarian cancers as compared
to low-malignant-potential ovarian
tumors.[20]
Despite the tremendous utility of
standard 2D-PAGE, this technology
has significant limitations. The primary
limitations are related to sensitivity
of detection and spot separation.
In particular, extremely basic or acidic
proteins as well as low-molecularweight
proteins are poorly separated
with standard 2D-PAGE protocols.
Therefore, the test can only survey a
fraction of the cellular proteome and
is most useful for analysis of abundant
proteins larger than 10 kd. Because
most secreted proteins fall into this
category, 2D-PAGE is a powerful tool
with which to search for clinically useful
biomarkers.
Differential In-Gel Electrophoresis
Differential in-gel electrophoresis
(DIGE) is an emerging technology[21]
that compares protein expression patterns
by labeling protein samples with
unique fluorescent dyes (ie, Cy2, Cy3,
and Cy5) and then separating them on
a single 2D-PAGE gel. This allows the
simultaneous comparison of two to
three protein samples and provides a
relative quantitative assessment of protein
expression levels. The primary
advantage of DIGE is that it eliminates
intrinsic gel-to-gel variability, which
can compromise comparative studies.
DIGE has been used successfully
to identify the changes in protein expression
associated with esophageal
cancer.[22] In this study, protein lysates
from normal and cancer esophageal
cells were labeled with Cy3
and Cy5 fluorescent dyes, respectively,
and separated by 2D-PAGE. Of more
than 1,000 spots identified in both
samples, 58 were found to be upregulated
more than threefold, and 107
were downregulated more than threefold.
One of the downregulated proteins
was identified by capillary highperformance
liquid chromatography/
tandem mass spectrometry to be
annexin I, and one of the upregulated
proteins was found to be tumor rejection
antigen (gp96).
DIGE is more sensitive than
Coomassie Blue staining but detects
40% fewer spots than SYPRO Ruby
dye. This reduced sensitivity compared
to other fluorescent-based stains
is due to the requirement that only 1%
to 2% of lysine(Drug information on lysine) amino acid residues
that form each protein be fluorescently
labeled in order to maintain protein
solubility. Currently, DIGE is only
useful for the analysis of relatively
abundant proteins; however, in the
near future, improvements will likely
lead to enhanced sensitivity and
broader application of this technology.
DIGE is a robust method of making
quantitative comparisons of global
protein expression levels among different
tissue types, and thus a powerful
tool with which to search for diagnostic
and prognostic biomarkers.
Isotope-Coded Affinity Tagging
Isotope-coded affinity tagging
(ICAT) distinguishes two populations
of proteins by labeling each with different
isotope tags-a light reagent
derived from eight hydrogen atoms or
a heavy reagent derived from eight
deuterium atoms.[23] These tags are
linked to a chemical agent that specifically
binds the thiol group of cysteine(Drug information on cysteine)
residues in proteins and peptides.
Following labeling, protein mixtures
are subjected to proteolytic cleavage
and fractionated by affinity chromatography.
The relative amount and
identity of each protein is revealed by
mass spectroscopy.
Qualitative information, based on
the relative ratio of isotopic molecular
mass peaks that differ by 8 Da (the
mass difference between the light and
heavy reagent), is ascertained by
nanoscale liquid chromatography/
electrospray ionization mass spectroscopy.
This technology is particularly
useful in the analysis of membrane and
hydrophobic proteins that can be difficult
to dissolve and separate by 2DPAGE.[
24] ICAT also facilitates the
analysis of low-molecular-weight proteins
and peptide fragments.
Compared to 2D-PAGE (particularly
with the application of DIGE),
ICAT is less quantitative. Inaccuracies
in relative quantitative assessment can
result from differential fragmentation
of the light and heavy tags, which can
alter elution times and subsequent ionization.
Moreover, standard ICAT
technology does not analyze all proteins,
because first-generation tags
only label proteins with cysteine residues
flanked by appropriately spaced
protease cleavage sites.
Despite these limitations, technologic
improvements may well enhance
the utility of ICAT as a tool for biomarker
discovery. One potentially useful
strategy is to label proteins with
different ICAT reagents and then separate
them with 2D-PAGE.[25] Because
the most advantageous aspect of ICAT
is its independence from gel-based
separation, improvements in ICAT reagents
and labeling protocols that facilitate
uniform and efficient labeling
of all proteins holds the greatest promise
for improving the utility of this
technology.
Clinical Proteomics
Tissue Microarrays
A major obstacle in translating
findings from biomarker discovery
studies into clinical practice is reliable
validation of initial findings in large
clinical data sets. Traditionally, validation
studies have relied on immunohistochemical
analysis of tissue
slides from individual patients and
protein quantification by visual scoring.
Because substantial variation in
staining occurs from tissue slide to tissue
slide (and visual scoring can only
provide semiquantitative information),
it may not be valid to generate data by
standard immunohistochemical studies.
Tissue microarray has been developed
to facilitate high-throughput immunohistochemistry
and reduce experimental
variability.[26]
Tissue microarrays are constructed
by incorporating multiple (0.6 mm
wide * 3-4 mm high) tissue cores onto
a single paraffin(Drug information on paraffin) block. From this
block, 5-μm sections are cut onto a
glass slide and analyzed by standard
immunohistochemistry. This approach
facilitates the simultaneous analysis of
as many as 1,000 clinical samples including
many different stages and
grades of cancer.
Tissue microarrays have been particularly
useful in validating and determining
the clinical significance of
findings from cDNA microarray studies.
For example, researchers used
cDNA microarrays to discover that
EZH2 was overexpressed in highgrade
prostate cancers.[27] Immunohistochemical
studies utilizing tissue
microarrays confirmed that EZH2 was
commonly overexpressed in highgrade
prostate cancers and that this
finding predicted increased risk for
failure of local therapy.[28]
Researchers have developed techniques
to quantitatively assess protein
expression levels, such as digital image
analysis. These methods involve
staining tissue sections with standard
immunohistochemical protocols and
then measuring the level of peroxidase
or fluorescent staining with an optical
scanner.[29] Digital image analysis
was recently used to demonstrate that
androgen-receptor protein expression
was 81% higher in black American
men with prostate cancer than in white
American men.[30] Combining digital
image analysis with tissue microarrays
is a powerful strategy for validating the
clinical utility of previously identified
diagnostic and prognostic biomarkers.
Protein Lysate Arrays
Another new technology that can
not only facilitate clinical biomarker
validation but, importantly, can be
used to quantify changes in cellular
signaling processes from extremely
small cellular samples is reverse-phase
protein arrays (ie, protein lysate arrays).[
31,32] This technology involves
arraying protein lysates from several
hundred clinical samples in serial dilutions
on a single nitrocellulose membrane
(Figure 2). For many tissue
samples, laser capture microdissection
is required to procure a pure population
of the cells of interest, which can be
obtained even from a biopsy specimen.
Protein expression levels are measured
with standard antibody staining protocols
and optical scanning.[32]
This technology was used to demonstrate
that progression from benign
prostatic epithelium to invasive prostate
cancer was associated with increased
phosphorylation of AKT, suppression
of apoptosis, and decreased
phosphorylation of ERK. Compared
to tissue microarrays, reverse-phase
arrays are much more sensitive, can
analyze dozens of end points from
only a few thousand cells, are nonsubjective
(thereby providing more
accurate quantitative information), and
are not affected by antigen retrieval
issues.
The main disadvantage of this technology
is the frequent need for laser
capture microdissection, which in
some cases can be technically challenging
and labor intensive, although
newer automated laser capture technology
is eliminating this roadblock.
Work is under way to automate the
entire process so that protein lysate
arrays will likely become a highly useful
tool not only for validation studies,
but also for clinical decision-making.
This technology is especially germane
to patient-tailored therapy when analysis
of cellular signaling pathways and
posttranslational modifications is required.
Protein array technology may
be more adaptable to clinical practice
than are tissue microarray methods,
because it is less subjective and more
reproducible.
Proteomic Pattern Analysis
An emerging body of data suggests
that for most cancers, the assessment
of a pattern of multiple biomarkers
provides more robust diagnostic and
prognostic information than the measurement
of a single biomarker. Advances
in proteomic technologies
have made it possible to rapidly assess
complex protein expression patterns
in a large number of clinical
samples. MALDI-TOF and surfaceenhanced
laser desorption ionization
time-of-flight (SELDI-TOF) mass
spectrometry are new technologies
that can profile low-molecular-weight
proteins.[33-36]
- SELDI-TOF-This proprietary technology utilizes a ProteinChip System and ProteinChip Reader (Ciphergen Biosystems, Fremont, Calif) to facilitate protein capture, purification, and analysis on a single platform (Figure 3). It produces crude but rapid protein purification and signal amplification and is a potentially valuable cancer biomarker screening tool because it rapidly generates a reproducible low-molecular-weight protein fingerprint from a minuscule sample (ie, 1 μL). SELDI-TOF mass spectrometry can accomplish high-throughput protein expression profiling from human tissue and body fluids. It has been shown to identify protein signatures from nipple aspirates that discriminate women with breast cancer from healthy women.[37] It has also been used to analyze protein expression patterns from pure populations of human cells procured by laser capture microdissection. With SEDLI-TOF, unique protein fingerprints characteristic of benign prostatic epithelium, high-grade prostatic intraepithelial neoplasia, and prostate cancer have been identified.[38]
- Pattern Recognition Algorithms- Because of its ability to rapidly analyze a large number of samples, SELDI-TOF is particularly well suited to generate informative proteomic patterns from serum. Visual analysis only detects gross changes in protein expression, but bioinformatics tools detect subtle differences in patterns of protein expression. Importantly, because of the huge dimensionality of the data, advanced pattern recognition algorithms are required to find the hidden, nonapparent signatures in a background of noise and chaos. Bioinformatics tools that utilize artificial intelligence-based pattern recognition algorithms can facilitate analysis of complex data sets. An analytic bioinformatics tool recently developed to analyze SELDITOF data streams-Proteome Quest beta version 1.0 (Correlogic Systems Inc, Bethesda, Md)-combines elements of genetic algorithms and selforganized cluster analysis. Proteomic data sets or spectra composed of 15,200 mass/charge ratio (m/z) values on the x-axis, with the corresponding amplitude on the y-axis are generated by this technique and imported into the genetic algorithm as an ASCII file. The genetic algorithm functions in a manner similar to natural selection, determining the subset of amplitudes at defined m/z values that best separates a "training" data set into predetermined groups. In other words, the genetic algorithm randomly analyzes multiple pattern combinations until one that discriminates the two groups of interest is found. This pattern is then recombined ("mated") with additional data. Nondiscriminatory patterns are discarded, and discriminatory ones further refined. Once this fitness test has been successfully applied to all of the "training" data, the resultant set of y-axis- defined amplitudes that fully discriminate the training set is determined. Spectra are generated by SELDI-TOF from a set of "blinded" samples. These data are compared for their similarity to the previously defined patterns generated with the "training" set. A decision is then made that classifies the unknown samples either into one of the previously defined groups or into an "unclassified" group. As more data are input, existing clusters are refined and new clusters formed. Thereby, the genetic algorithm "learns" by experience, and in theory, will become more accurate over time.
- Clinical Utility-Artificial intelligence- based pattern recognition of serum proteomic profiles has been applied to the detection of ovarian and prostate cancer. Using this approach, a diagnostic algorithm was generated that yielded an overall positive predictive value of 94% for the diagnosis of ovarian cancer, and all 18 women with stage I ovarian cancer were correctly classified by the algorithm.[ 39] Although these preliminary studies have generated highly promising data and demonstrated the feasibility of a new diagnostic paradigm, the introduction of serum proteomic pattern diagnostics into clinical practice will be hindered by machine-tomachine, day-to-day, and platformto- platform variations, which may limit the ability to generate reproducible data streams. This problem is compounded by other factors: Human disease arises from a heterogeneous population, the disease process itself is multifactorial and heterogeneous, and clinics vary in their sample collection methodology. A major limitation for clinical implementation may be the mass spectrometer platforms themselves. The use of high-end mass spectrometers is now being explored as a possible solution to the problems of reproducibility. The QSTAR Pulsar LC/MS/MS System (Applied Biosystems Inc, Foster City, Calif) is a high-performance hybrid quadrupole time-of-flight mass spectrometer that can analyze protein samples applied to Ciphergen's ProteinChip Arrays. The QSTAR has higher resolution and can generate far more data points than the ProteinChip Biology System II (PBS II) instrument, and most importantly, the increase in mass accuracy reduces machine-tomachine differences in mass drift. Moreover, because the source is uncoupled from the mass analyzed, this type of machine generates much truer time of flight and far less laserinduced fragmentation than a linear instrument (which means fewer confounding peaks that are not related to the disease but artificially induced by the process itself). Unlike the PBS II, the QSTAR can accomplish direct tandem mass spectometry protein identification. Because of these differences, it is likely that proteomic patterns generated from the QSTAR will be more robust than those generated by the PBS II, and indeed, this was found to be true. In a recent report, patterns were discovered that identified 100% of ovarian cancers, including all stage I cases and 63 of 66 cases of nonmalignant disease.[40] Based on these results, the investigators have extended this paradigm to more advanced highresolution instrumentation for upcoming National Cancer Institute/Center for Cancer Research-based clinical trials of ovarian cancer detection. This concept is not limited to just one type of cancer. Researcher found an algorithm for prostate cancer that yielded a positive predictive value of 41%. The genetic algorithm correctly identified 36 of 38 men with prostate cancer (ie, 95% sensitivity) and 177 of 228 men with benign biopsies (ie, 76% specificity). Among men with total PSA levels between 4.0 and 10.0 ng/mL, 97 of 137 (71%) were correctly classified as having benign prostates. Thus, if serum proteomic analysis had been used to determine the need for prostate biopsy, 70% of "unnecessary" biopsies could have been prevented, whereas only 5% of cancers would have been missed. Importantly, the genetic algorithm "correctly" classified all the men with prostate cancer. In addition, serum samples from seven men with moderate-grade, organ-confined prostate cancer were obtained prior to radical prostatectomy and 6 weeks postoperatively.[41]
Another analytic strategy utilizes a
decision-tree algorithm that relies on
binomial decisions based on heights
of a predefined set of specific protein
peaks. Using this approach in a
blinded test set of 60 men (30 with
prostate cancer and 30 with benign
prostates) yielded a sensitivity of 83%
and a specificity of 97%.[42]
Conclusions and
Future Directions
Proteomics is a multifaceted discipline
that encompasses biomarker discovery
to clinical diagnostics. Technologic
advances have greatly improved
2D-PAGE technologies as well
as non-gel-based protein-profiling
strategies and have facilitated the discovery
of protein changes associated
with malignant transformation and
progression. The development of tissue
and protein arrays has provided
high-throughput quantitative tools
with which to validate the clinical utility
of biomarkers identified through
discovery-based studies. Protein
microarray technology is envisioned
as a means of profiling the changing
state of cellular circuitry-before,
during, and after therapy-to monitor
multiple protein phosphorylation
events at once. This type of technology
could have immediate impact and
utility in the new era of targeted molecular
medicine.
Advances in mass spectroscopy
have made it possible to rapidly generate
complex proteomic profiles from
serum, and powerful bioinformatics
tools have been developed to analyze
these extremely complex data sets. The
technology has advanced to the point
that proteomic studies will likely have
a major impact on how cancer patients
are diagnosed and treated. In the shortterm,
proteomics will enhance the discovery
of highly predictive biomarkers
to help clinicians diagnose cancer
while it is still at a curable stage and
determine the most appropriate
therapy for any given patient.
Ultimately, however, the proteomic
analysis itself may become the diagnostic
or prognostic test. This approach
will likely provide the most
accurate diagnostic and prognostic
information for a given patient.
Proteomic technologies have advanced
to the point of making molecular
diagnostics and tailored therapies
possible. It is now incumbent upon
clinicians and translational scientists
to make the promise of "molecular
medicine" a reality.
