Selecting and Combining Biomarkers (or Tests)

This SAS macro is used for selecting and combining multiple continuous biomarker candidates to predict a binary outcome, e.g. diseased versus non-diseased. It conducts the following analyses in one package: 1) Select individual markers using partial area under the ROC curve, specificity or sensitivity, or both; 2) For selected individual markers, there are three options to combine markers (forward logistic regression, Boosting logistic regression (Real AdaBoost), Boosting tree (Discrete AdaBoost)); 3) Cross-Validation is built-in from the beginning of the first step, i.e., at the biomarker selection stage. Model building stops when cross-validation total classification error starts to increase. One also has the option to give sensitivity and specificity equal weights in classification error calculation. This is useful when the case:control ratio is not 1 and analyst does not want the dominant group to drive model selection.

Boosting is included due to its reputation for resistance to over fitting, a desirable feature for high dimensional data analysis. However, we have found that boosting can still over fit data. That is why cross-validation is important in model selection and assessment.

A description of the procedure can be found in chapter 18 of the book, Informatics in Proteomics Srivastava S (Eds) Marcel Dekker Inc., New York. 2005. A copy of the chapter, "Statistical design and analytical strategies for discovery of disease specific protein patterns." can be found here. The SAS code is here.
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
©2013 Fred Hutchinson Cancer Research Center, a 501(c)(3) nonprofit organization.
Terms of Use & Privacy Policy.