The Statistical Evaluation of Medical Tests for Classification and PredictionMS Pepe. Oxford Statistical Science Series, Oxford University Press. 2003.  

last updated 23 March 2011
contact us with comments or suggestions about this web page or its contents
Book Description 
return to top 
This book describes statistical concepts and techniques for evaluating medical diagnostic tests and biomarkers for detecting disease. More generally, the techniques pertain to the statistical classification problem for predicting a dichotomous outcome. Measures for quantifying test accuracy are described including sensitivity, specificity, predictive values, diagnostic likelihood ratios and the Receiver Operating Characteristic Curve that is commonly used for continuous and ordinal valued tests. Statistical procedures are presented for estimating and comparing them. Regression frameworks for assessing factors that influence test accuracy and for comparing tests while adjusting for such factors are presented.
This book presents many worked examples of real data and should be of interest to practicing statisticians or quantitative researchers involved in the development of tests for classification or prediction in medicine.
Table of Contents 
return to top 
1. Introduction
2. Measures of Accuracy for Binary Tests
3. Comparing Binary Tests and Regression Analysis
4. The Receiver Operating Characteristic Curve
5. Estimating the ROC Curve
6. Covariate Effects on Continuous and Ordinal Tests
7. Incomplete Data and Imperfect Reference Tests
8. Study Design and Hypothesis Testing
9. More Topics and Conclusions
References/Bibliography
Index
Datasets 
return to top 
Study  Reference  Stata file  ASCII file 
CASS  Leisenring et al. (2000)
Weiner et al. (1979) 
est1.dta  est1.csv
est1_desc.txt 
Pancreatic Ca biomarkers  Wieand et al. (1989)  wiedat2b.dta  wiedat2b.csv
wiedat2b_desc.txt 
Ultrasound for hepatic mets  Tosteson and Begg. (1988)  tostbegg2.dta  tostbegg2.csv
tostbegg2_desc.txt 
CARET PSA  Etzioni et al. (1999)  psa2b.dta  psa2b.csv
psa2b_desc.txt 
Gene expression array  Pepe et al. (2003)  orchratio2.dta  orchratio2.csv
orchratio2_desc.txt 
Norton neonatal audiology  Norton et al. (2000)  nnhs.dta  nnhs.csv
nnhs_desc.txt 
Leisenring neonatal audiology  Leisenring et al. (1997)  lplaudio_b.dta  lplaudio_b.csv
lplaudio_b_desc.txt 
Prostate Ca  St. Louis  Smith et al. (1997)  psa_dre_v2.dta  psa_dre_v2.csv
psa_dre_v2_desc.txt 
Stover audiology  Stover et al. (1996)  dp2.dta  dp2.csv
dp2_desc.txt 
Scintigraphy study  Muller et al. (1989)  mlt1.dta  mlt1.csv
mlt1_desc.txt 
59 Pap screen studies  Fahey et al. (1995)  fim.dta  fim.csv
fim_desc.txt 
Prenatal screen data (hypothetical)  hpns.dta  hpns.csv
hpns_desc.txt 
Etzioni R, Pepe M, Longton G, Hu C, Goodman G (1999). Incorporating the time dimension in receiver operating characteristic curves: A case study of prostate cancer. Medical Decision Making 19:24251.
Fahey MT, Irwig LM, Macaskill P (1995). Metaanalysis of Pap test accuracy. American Journal of Epidemiology 141:6809.
Leisenring W, Alonzo T, Pepe MS (2000). Comparisons of predictive values of binary medical diagnostic tests for paired designs. Biometrics 56:34551.
Leisenring W, Pepe MS, Longton G (1997). A marginal regression modelling framework for evaluating medical diagnostic tests. Statistics in Medicine 16:126381.
Muller C, Wasserman HJ, Erlank P, Klopper JF, Morkel HR, Ellmann A (1989). Optimisation of density and contrast yielded by multiformat photographic images used for scintigraphy. Physics in Medicine and Biology 34:47381.
Norton SJ, Gorga MP, Widen JE, Folsom RC, Sininger Y, ConeWesson B, Vohr BR, Mascher K, Fletcher K. (2000). Identification of neonatal hearing impairment: Evaluation of transient evoked ototacoustic emission, distortion product otoacoustic emission, and auditory brain stem response test performance. Ear and Hearing 21:50828.
Pepe MS, Longton G, Anderson G, Schummer M (2003). Selecting differentially expressed genes from microarray experiments. Biometrics (in press) .
Smith DS, Bullock AD, Catalona WJ (1997). Racial differences in operating characteristics of prostate cancer screening tests. The Journal of Urology 158:186166.
Stover L, Gorga MP, Neely T (1996). Torwards optimizing the clinical utility of distortion product otoacoustic emission measurements. Journal of the Acoustical Society of America 100:956967.
Tosteson AN, Begg CB (1988). A general regression methodology for ROC curve estimation. Medical Decision Making 8:20415.
Weiner DA, Ryan TJ, McCabe CH, Kennedy JW, Schloss M, Tristani F, Chaitman BR, Fisher LD (1979). Exercise stress testing. Correlations among history of angina, STsegment response and prevalence of coronaryartery disease in the Coronary Artery Aurgery Study (CASS). New England Journal of Medicine 301(5):2305.
Wieand S, Gail MH, James BR, James KL (1989). A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika 76:58592.
Programs 
return to top 
Stata version 7 or higher required for most programs; version 8 or 9 required for some as updates and additions become available.
Examples 
return to top 
Book Errata 
return to top 