Combining Artificial Neural Networks and Transrectal Ultrasound in the Diagnosis of Prostate Cancer
Combining Artificial Neural Networks and Transrectal Ultrasound in the Diagnosis of Prostate Cancer
Traditional (gray-scale) transrectal
ultrasound (TRUS) is the
most widely used and possibly
the most important imaging modality
in the diagnosis of prostate cancer.
Virtually all urologists, whether working
in a tertiary care center or in community
practice, have immediate
access to an ultrasound unit. Arguably
the most important step in the
treatment of prostate cancer lies in its
early diagnosis. More than 1 million
TRUS-guided prostate needle biopsies
are performed annually in the
United Sates, resulting in the detection
of 200,000 new cases per year.
Refinements in treatment modalities
for prostate cancer including
anatomic nerve-sparing radical prostatectomy,
conformal radiation, and brachytherapy
have improved patient outcomes.
Unfortunately, the urologist's ability
to diagnose prostate cancer has not
kept pace with therapeutic advances.
Currently, many men are facing the
need for prostate biopsy with the better-
than-average likelihood that the
result will be inconclusive.
In routine urologic practice, clinicians
must decide whether to perform
a prostate biopsy on the basis of a few parameters. Prostate-specific antigen
(PSA) levels, the results of a digital
rectal exam (DRE), and the patient's
age are the parameters most often
employed. The clinician's prior experience,
a compilation of personal results
and "rules of thumb" learned
during training, may influence the decision
to perform a biopsy. Unfortunately,
personal predictions are often
subject to inherent biases, weakened
by an inability to memorize the complete
Predictive modeling tools are available
to assist the clinician in the decision-
making process. The most widely
known of these is the Partin nomogram,
which can be employed by the
physician during pretherapy discussions
about prostate cancer with the
patient to predict final pathologic stage
of disease. This paper will focus
on the tools available to the clinician
to assist in the prediction of the prostate
needle biopsy outcome. We will
examine the use of "machine learning"
models (artificial intelligence) in
the form of artificial neural networks
(ANNs) using prebiopsy variables.
Early Diagnosis Dilemmas
Early detection of prostate cancer
relies on the judgment of the physician
coupled with the application of
common clinical variables. The decision
to perform an ultrasound-guided
prostate biopsy rests on the assessment
of appropriate clinical data, including
the results of a DRE, PSA
level, patient's age, and ultrasound
findings. Unfortunately, when assessed
individually, these variables
have limited efficacy in terms of accurately
guiding the physician and
patient in the decision to undergo biopsy.
For example, PSA is associated
with high false-negative and false-positive
rates, 20%-40% and 21%, respectively
(positive predictive value:
~32%). Similarly, DRE and overall
clinical judgment are associated
with low positive predictive values of
21% and 33%, respectively. That said,
an estimated 25% of men undergoing
prostate biopsy with a PSA level between
4 and 10 ng/mL are found to be
Predicting the outcome of prostate
needle biopsy based on a few clinical
variables results in an increased risk
of unnecessary biopsies. Additional
prebiopsy markers are currently being
evaluated, including PSA density,
percent-free PSA, transition-zone
PSA, PSA velocity, presence of prostatic
intraepithelial neoplasia (PIN)
and atypical acinar proliferation. The
addition of multiple new markers,
while potentially improving the prediction
of biopsy outcomes, makes it
considerably more difficult for the
practicing urologist to accurately assess
the vast array of clinical data and
apply appropriate judgment. Mathematical
models and ANNs have been
developed to assist the physician in
assessing the risk of positive biopsy
based on multiple parameters.
Artificial Neural Networks
ANNs are named after the natural
mammalian neuron arrangement, in
which neurons as specialized entities
are interconnected, receiving signals
propagated throughout the system. By
receiving weighted signals from
specialized systems, it is felt that the
mammalian nervous system has the
capacity to learn.
A typical (feed-forward) ANN has
an input layer, at least one hidden
layer, and an output layer (Figure 1).
In the case of predicting prostate biopsy
outcome, the input units, for example,
would be PSA, DRE, age, and
percentage of free PSA. The dataset
is randomly split into a training set
(data used to teach the neural network)
and a validation set (data put
aside to test the accuracy of the ANN
after training). These four inputs (PSA,
free PSA, DRE, and age) are then
"fed forward" into the hidden layer,
where their value is weighted to produce
the desired outcome-in this
case, a positive or negative biopsy
The important point is that the outcome
for the training data is known,
and therefore, the neural net can be
sequentially trained to achieve the perfect
answer every time the training data
are loaded into the system. The ability
of the neural net to produce the correct
answer where the outcome is theoretically
unknown is then tested using
the validation set. Inputs from the validation
set are fed forward, and the
ANN result is recorded. This result is
then compared to the known outcome
from the validation set, and the two
results are compared in a receiveroperator
characteristic (ROC) curve.
The area under the ROC curve is used
as a measure of accuracy, with a value
of 1.0 representing perfection and a
value of 0.5, a 50% likelihood that the
model will respond correctly.
Standard statistical techniques (eg,
logistic regression) rely on a linear relationship
between the variable and the
outcome. In biologic systems, such linearity
often does not exist, and the
ANN has the benefit of being able to
capture complex nonlinear relationships
by virtue of its architectural
arrangement of neurons and their
ability to weight the forward signal in
a nonlinear fashion. The ANN model
is, therefore, felt to have a potential
advantage in terms of predictive
Validated Predictive Models
Currently, six validated predictive
models have been published using prebiopsy
parameters to predict prostate
biopsy outcome (Table 1). Five of the
six are ANN models, and one is based
on logistic regression.
Snow et al
In one of the first applications of ANNs to urologic oncology, Snow et al developed a model to predict biopsy outcome using data from 1,789 patients who were undergoing prostate cancer screening. This model used the input variables of age, PSA, PSA velocity, and TRUS findings. The ANN model by Snow et al demonstrated a sensitivity of 0.7 and a specificity of 0.92, but unfortunately, these investigators did not report the model's accuracy as a ROC curve. The study was based on a retrospective screening population and validated against an independent patient cohort. Since the early ANN model of Snow et al, three groups have subsequently attempted to develop paradigms that predict prostate biopsy outcome in men with a low PSA level (2-4 ng/mL). Two of these have used the ANN approach. Babaian et al
Babaian et al developed an ANN model using PSA, creatinine kinase, prostatic acid phosphatase, and age to predict the likelihood of a positive biopsy. This ANN was reported to have an ROC accuracy of 0.74 with appropriate validation. In this select cohort of patients (PSA: 2-4 ng/mL), it appears that this model would prevent almost 50% of unnecessary biopsies at a sensitivity of 92%. The authors found ROC accuracies of 0.74 and 0.75, respectively, for PSA density and PSA transition zone density. Thus, PSA density appears to have an accuracy equivalent to that of the ANN model used in the cohort of patients examined by Babaian et al. Djavan et al
For patients with PSA levels ranging from 2.5 to 4 ng/mL, Djavan et al similarly reported using an ANN based on PSA transition zone density, free PSA, PSA density, and prostate volume for 272 patients. In this cohort, patients underwent sextant biopsies with two transition zone biopsies. The patient population comprised men in the European Prostate Cancer Detection Study, who had been referred to a urologist with lower urinary tract symptoms or for early detection of prostate cancer. It is unclear what proportion of men in this PSA range had abnormal DREs. The overall positive biopsy rate was 24%, and the ANN produced a validated ROC accuracy of 0.876. This model was compared to logistic regression models constructed using only 66% of the original data. The accuracy of the logistic regression model was 0.85. Both the Babaian and the Djavan models were constructed to evaluate patients with relatively low serum PSA values. Unfortunately, the application of these models to men in the United States presenting to a urologist for evaluation for prostate cancer may be difficult, as the majority of those with a PSA between 2.5 and 4 ng/mL have an abnormal DRE. Eastham et al
Eastham et al recently examined a similar patient cohort of men with PSA < 4 ng/mL and an abnormal DRE, and developed a logistic regression model to predict positive prostate biopsy.[ 10] In evaluating a diverse patient population using race, PSA, and age, Eastham et al reported an ROC accuracy of 0.75. In this distinct patient subset (PSA < 4 ng/mL), the three validated models cited above have accuracies ranging from 0.75 to 0.875. Although these results represent substantial improvements in accuracy over the use of single serum tests (eg, PSA), neither ANN model is available on the World Wide Web. Predicting a Positive Biopsy in High-Risk Patients The broader question of predicting a positive prostate biopsy in men presenting with either a PSA > 4 ng/mL or an abnormal DRE has been approached using ANN models. In the same report cited above, Djavan et al developed an ANN from a screening population of 974 men in the European Prostate Cancer Detection Study. All men underwent sextant biopsy with two additional transition zone biopsies. If the first biopsy was negative for prostate cancer, a second identical biopsy was performed. The data from which the Djavan ANN model is developed therefore represents an extremely select group of patients: The ability of this model to predict outcome is limited to men with a PSA of 4 to 10 ng/mL who underwent repeat biopsy if the first biopsy was inconclusive. Within this paradigm, the ROC accuracy of the ANN model was 0.91 compared to the accuracy of a logistic regression model of 0.90. Although this represents a robust model, caution should be exercised in applying this result to other contemporary series. First, the model applies only to men who underwent repeat sextant biopsy; men in the United States usually do not automatically undergo an immediate repeat biopsy unless worrisome histologic markers are identified (eg, PIN). Moreover, current opinion favors 8- to 10-core biopsies with attention to the lateral most aspect of the prostate. Second, models and nomograms used to predict outcome usually report a range of accuracies representing ROC results from ANN "cross-validation." Cross-validation refers to the splitting of the dataset several times into test sets and training sets. This allows assessment of the overall performance of the model. The performance of the model is then reported as a range, with average ROC accuracy noted. It is unclear from the Djavan ANN model whether cross-validation was performed. ANN vs Logistic Regression Methods Recently, Porter et al published the results of their predictive models, based on both ANN and logistic regression methods, from a racially diverse prospective series of 319 patients. All patients underwent a 10-core prostate needle biopsy with attention to the lateral aspect of the gland. The patient population represented men referred to the urologist with either an abnormal DRE or an elevated PSA (> 4 ng/mL; range: 0.8- 367 ng/mL). Five-way cross-validation was performed, and the mean ROC accuracies of the ANN and logistic regression models were reported as 0.77 (range: 0.83-0.71) and 0.76 (range:0.81-0.71), respectively. Although Djavan et al studied a larger patient population, the work by Porter et al may represent a more common clinical scenario. Thus, ANNs appear to be equivalent to their logistic regression counterparts in predicting prostate biopsy outcome, but no ANNs are currently available on the World Wide Web. In their report, however, Porter et al noted that they have scheduled the inclusion of an ANN on the Web at prostatecalculator.org. If ANNs are to be clinically useful, it is necessary for them to be easily accessible in either handheld computer versions or on the Web. Advantages and Limitations of ANNs
In general, biologic systems are neither binary nor linear. Clinicians are faced with a constellation of parameters, many of which are not related to each other in a straightforward fashion. Traditional statistical methods cope with this variance by assigning cut-off points-for example, a PSA >10 ng/mL. This system has led to risk-group analysis that is easily memorized and simply applied. Artificial intelligence, in the form of machine learning or ANNs, has an advantage over traditional statistical methods in that the relationships between variables need not be linear. The ANN can theoretically learn, and therefore, weight the input variables so that the most efficient predictive model is acquired. The ANNs listed in Table 1 are not perfect in predicting the outcome of prostate biopsy (nor, as a result, in predicting a diagnosis of prostate cancer). Their reported accuracies range from 0.75 to 0.91, and most are limited in their application to a distinct subset of patients. Nevertheless, it can be argued that a predictive accuracy of even 0.75 may be preferable to no model at all. Most studies demonstrate the superior accuracy of predictive models over human judgment. Although the outcome with respect to biopsy may be binary (ie, the patient will have either a positive or negative result), counseling patients with regard to invasive testing can be assisted by relatively accurate predictive models. Predictive models, however, need to be improved. Improving the predictive accuracy of these models requires more extensive collection of data, which increases the sample size, and the application of more sophisticated modeling techniques. Increasing the sample size and, therefore, the data available to train the ANN, refers not only to the number of patients included in the database but also to the number and quality of the pretest parameters. Therefore, prospectively collected data with an emphasis on the collection of multiple variables is essential to the creation of accurate, clinically applicable predictive models. Wide-ranging clinical variables as well as serum markers (biomarkers) will likely enhance predictive accuracy. It is hoped that the addition of multiple variables, both clinical and serum based, will enhance the accuracy of models developed to predict the outcome of prostate biopsy. These models need to be able to prevent unneccessary biopsies while identifying all men with the disease. It could be argued, therefore, that the accuracy of models designed to predict the outcome of biopsy should be more accurate than models designed to predict final pathologic stage after surgery. After all, the fate of a man with organ-confined disease on one side of the prostate vs the same man with organ-confined disease on both sides of the gland is likely to be considerably different from that of a man who harbors a clinically significant cancer and fails to undergo biopsy on the "recommendation" of a predictive model. Conclusions In summary, the role of ANNs in providing valuable predictive models to be used in conjunction with TRUS appears promising. In the few studies that have compared ANNs to traditional logistic regression, both ANN and logistic regression appear to function equivalently when predicting the outcome of a biopsy. With the introduction of more complex prebiopsy variables, ANNs appear to be in a commanding position for use in predictive models. Easy and immediate physician access to these models will be imperative if their full potential in predicting outcomes is to be realized.
2. Kattan MW: Nomograms. Introduction. Semin Urol Oncol 20:79-81, 2002.
3. Partin AW, Kattan MW, Subong EN, et al: Combination of prostate specific antigen, clinical stage, and Gleason score to predict pathologic stage of localized prostate cancer. A multi-institutional update. JAMA 277:1445- 1451, 1997.
4. Catalona WJ, Richie JP, Ahmann FR, et al: Comparison of digital rectal exam and serum prostate specific antigen in early diagnosis of prostate cancer: Results of a multicenter clinical trial of 6,630 men. J Urol 151:1283-1290, 1994.
5. Catalona WJ, Smith DS, Ratliff TL, et al: Detection of organ-confined prostate cancer is increased through prostate specific antigen based screening. JAMA 270:948-954, 1993.
6. Schwarzer G, Schumacher M: Artificial neural networks for diagnosis and prognosis in prostate cancer. Semin Urol Oncol 20:89-85, 2002.
7. Snow PB, Smith DS, Catalona WJ: Artificial neural networks in the diagnosis and prognosis of prostate cancer: A pilot study. J Urol 52:1923-1926, 1994.
8. Babaian RJ, Fritsche H, Ayala A, et al: Performance of a neural network in detecting prostate cancer in the prostatic specific antigen range of 2.5 to 4.0 ng/mL. Urology 56:1000-1006, 2000.
9. Djavan B, Remzi M, Zlotta A, et al: Novel artificial neural network for early detection of prostate cancer. J Clin Oncol 20:921- 929, 2002.
10. Eastham JA, May R, Robertson JL, et al: Development of a nomogram that predicts the probability of a positive biopsy in men with an abnormal digital rectal exam and a prostate-specific antigen between 0 and 4 ng/ mL. Urology 54:709-713, 1999.
11. Gore JL, Shariat SF, Miles BJ, et al: Optimal combinations of systematic sextant and laterally directed biopsies for the detection of prostate cancer. J Urol 165:1560-1561, 2001.
12. Porter CR, O’Donnell C, Crawford ED, et al: Predicting the outcome of prostate biopsy in a racially diverse population: A prospective study. Urology 60:831-835, 2002.
13. Dawes RM, Faust D, Meehl PE: Clinical versus actuarial judgment. Science 243:1668-1674, 1989.
14. Kattan MW: Editorial. Statistical models, artificial neural networks, and the sophism, " I am a patient, not a statistic." J Clin Oncol 20:885-887, 2002.