Randolph Program
Biostatistics and Biomathematics Program, in the Division of Public Health Sciences
Quantitative methods for the analysis of biomedical data
Mathematical and statistical methods for high-dimensional, functional and otherwise "non-standard" data including: protein mass spectrometry, image-based data, microbiome and genetic array data, various spectroscopies and longitudinal data. Projects and collaborations are directed toward studies for the discovery and/or validation of molecular markers of disease.
Some active projects
-
TACOMA (
Tissue
Array
Co-
Occurence
Matrix
Analysis): This project aims to provide a reliable, interpretable and accurate open-source algorithm for quantifying immunohistochemically-stained tissue images. Motivated by an interest in biomarker validation in population-based studies, this project provides an automated method (based on pathologists' expertise) for scoring 1000's of "histospots" which can be mounted on tissue microarray (TMA) slides in a relatively high-throughput manner. The algorithm is based on textured and contextual information summarized by a gray-level co-occurence matrix formed for each image. This is joint work with Drs.
Pei Wang,
Beatrice Knudsen (Cedars-Sinai Medical Center) and Michael Linden (Univ. of Minnesota); Dr. Donghui Yan is the first author for these methods. A paper for TACOMA will appear in the
Annals of Applied Statistics. See the
manuscript (on arXiv.org).
-
PEER (
Partially
Empirical
Eigenvectors for
Regression): These projects investigate the role of the generalized singular value decomposition (GSVD) in ill-posed or unstable (high-dimension/low sample size) regression problems. When the observations are curves or images, they have local structure that may be accounted for in the estimation process with an informed choice of penalty operator, L. Classical least-squares regression is completely determined by the singular vectors of the data/design matrix, X (the collection of observed vectors or "predictors"), whereas PEER estimates are determined by the joint eigenproperties of X & L. This provides an analytically rigorous and tractable way of imposing spatial structure on the estimate. This is joint work with Drs.
Jaroslaw Harezlak (Indiana Univ.) and
Ziding Feng. To appear in
Electronic Journal of Statistics. See the
manuscript (on arXiv.org).
These methods are motivated by problems of protein characterization using Raman and magnetic resonance spectroscopy (MRS). They have application to many settings, including: logistic regression, penalized discriminant analysis, penalized partial least squares. A parallel project with Dr. J. Harezlak applies a PEER-based logistic regression model to MRS data for the classification of HIV patients.
-
Sahale MS: This project for the analysis of protein mass spectrometry is aimed at elucidating properties (challenges and/or advantages) of the various methods of protein and/or peptide quantification---e.g., spectral counting or ion abundance measures. It began with the software development of Dr. T. Milac for protein quantification in LC-MS/MS data. This JAVA software (SahaleJ) provides computational methods to quantify peptide and protein abundance and an R-package (SahaleR) provides some basic statistical methods for comparative "shotgun" proteomics experiments. We have recently applied these tools to illustrate some advantages and challenges in the quantification of LC-MS/MS data---ion intensities vs. spectral counting. The project is a joint effort with Dr.
Pei Wang. These tools are also being used for analysis of data in an ongoing R21 project (co-PI with Dr.
Rich Gardner at the Univ. of Washington) for mass spectrometry studies on the ubiquitination of proteins. A paper for this will appear in the journal
Statistics and its Interface. See the
manuscript (on arXiv.org)
-
The efficient use of longitudinal CD4 counts and viral load measures in survival analysis: This project shows that functional principal component (FPC) analysis provides an effective statistical approach for exploiting the patterns in CD4 counts and viral load trajectories for a cohort of female Kenyan sex workers. The FPC scores for each woman obtained by this method serve as informative summary statistics for the CD4-count and viral-load trajectories. In contrast to more classic summaries of these data, this approach extracts enough information from the CD4 count profiles to reveal a (known) significant association between CD4 counts and survival. This project is lead by Dr.
Sarah Holte and is joint with Drs. J. Baeten, J. Ding, J. Tien and J. Overbaugh. A paper for this will appear in
Statistics in Medicine. See the
Preprint (local access only).
Support for current and past projects is acknowledged:
- Quantitative Methods for Spectral and Image Data in Proteomics Research (Randolph, PI), R01-CA126205.
- Discovery of substrates of ubiquitin-protein ligases---the development of integrated experimental and computational methods (T. Randolph & R. Gardner, co-PI), R21-RR025787.
- Biomarkers in High-Dimensional Data From Genomics and Proteomics Technologies (T. Randolph, PI), K25-GM067211.
- Statistical Methods for Integrative Analysis of Genomics and Proteomics Data (P. Wang, PI; co-investigator), R01-GM082802.
- Early Detection Research Network: Data Management and Coordinating Center (Z. Feng, PI; co-investigator), U01-CA086368.
- Statistical Methodologies (R. Prentice, PI; co-investigator on project lead by Z. Feng) P01-CA053996.
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
©2012 Fred Hutchinson Cancer Research Center, a 501(c)(3) nonprofit organization.
Terms of Use & Privacy Policy.