Statistical Applications in the "Omic" Sciences (George Michailidis, organizer)


Kerby Shedden (Department of Statistics, University of Michigan)
Analysis of Small Molecule Subcellular Transport from Images

Friday 2:00-2:30, San Rafael

Abstract:

I will describe recent work on analyzing fluorescence-based microscopic images of small molecules in cells. Aims include qualitative assessment of whether a molecule is indeed fluorescent, whether it permeates into cells, and whether it accumulates in specific organelles, compartments, or membranes. A number of practical problems arise due to the low resolution of the images, and to imaging artifacts relating to poor focus, cell motion, background, and other factors. I will describe work on identification of compounds localizing to mitochondria based on textural features in the images. I will also describe how we estimated kinetic rate constants for diffusion across the plasma and mitochondrial membranes based on considering the image data in the context of a mathematical transport model.



Rebeka Jornsten (Department of Statistics, Rutgers University)
Clustering of miRNA and mRNA Expression via Rate-distortion Based Model Selection

Friday 2:30-3:00, San Rafael

Abstract:

We introduce a novel approach to model selection in the analysis of mRNA and miRNA expression data. By reformulating selection in terms of rate-distortion theory, we can simultaneously select genes that are differentially expressed, and identify which conditions are discriminating. This also simplifies the simultaneous selection in model-based clustering by making it entirely parallel across clusters. The goal is to allocate model complexity to each cluster of mRNA/miRNA, such that the trade-off between goodness-of-fit and model complexity is equally balanced between all clusters. In the most simple case, MSE is our distortion measure. Model complexity, or rate, is a function of the number of conditions for which the mRNA and miRNA are differentially expressed and the cluster size. Other rate criteria can also be derived using predictive densities. For each cluster, a rate-distortion curve is traced by computing the MSE for models of different complexity. If we use e.g. L2-boosting these curves are continuous; in subset selection the curves are linear interpolations between subset models. It has long been known that the optimal rate allocation corresponds to operating at points of equal slope on the rate-distortion curves, for all data subsets k. Any other allocation will lead to an increase in overall distortion (Ortega et al, 1998). Fixing an operating slope, we pick the cluster models at this slope of the rate-distortion curves. If no point on the k-th curve satisfies this slope constraint, the null model is automatically selected for cluster k. Several clusters may thus form a joint null-model cluster. Finally, the overall global fit of the mixture model is evaluated using the BIC criterion, performing a line-search over operating slopes.

We apply our method to the analysis of developmental miRNA/mRNA expression. Two cell-lines (one experimentally confirmed to be 'pre-programmed' to become neurons, the other to become glia) are observed at 0, 1 and 3 hours after a growth factor is added to the medium. To determine which miRNA-mRNA differ between the cell-lines, and at what time points, we fit a multi-level mixture model to the data; the first level of the hierarchy models the time-course, allowing for a sign-flip to account for negative association between miRNA-mRNA pairs (repressor vs activator); the second level the cell-line/time interactions. A total of 5 clusters are selected, corresponding to diverging, converging, and static cell-line differences. Biological validation of the diverging miRNAs, believed to determine the fate of the cell population, is now underway.


George Michailidis (Department of Statistics, Univesity of Michigan)
Modeling and Analysis of Quantitative Proteomics Data obtained from iTRAQ Experiments

Friday 3:00-3:30, San Rafael

Abstract:

In this talk we discuss the statistical challenges posed by protein expression data obtained from iTRAQ experiments. iTRAQ reagents allow relative quantitation of proteins obtained from four-plex experiments, through isobaric labeling of complex protein mixtures. A random effects model is developed, that takes into consideration the nature of the data and provides a framework for rigorously inferring differential expression of protein expression levels. The methodology is illustrated on a time course data set containing information about the epithelial-mesenchymal transition process and one obtained from cholera experiments.