Genomics (Chiara Sabatti, organizer)


Sunduz Keles (University of Wisconsin)
Mixture Modeling for Genome-wide Localization of Transcription Factors

Thursday 10:30-11:00, Fountain II

Abstract:

Chromatin immunoprecipitation followed by DNA microarray analysis (ChIP-chip methodology) is an efficient way of mapping genomewide protein-DNA interactions. Data from tiling arrays encompass DNA-protein interaction measurements on thousands or millions of short oligonucleotides across a whole chromosome or genome. We propose a new likelihood based method for analyzing ChIP-chip data. This method is motivated by the widely used two component multinomial mixture model of regulatory motif detection problem and utilizes a hierarchical mixture model of binding intensities while incorporating apparent spatial structure of the data. Individual probes within a genomic region are allowed to have different localization rates accommodating different binding affinities. Furthermore, fixed window size assumption, which is commonly used when computing a test statistic for these type of spatial data, is relaxed by imposing a distribution on the window size. Simulations investigating the operating characteristics of the proposed method and applications involving general transcription factors from Drosophila will be presented.



Jon McAuliffe (University of Pennsylvania)
An Infinite-state Generalized Hidden Markov Phylogeny for Multi-species Regulatory Module Detection

Thursday 11:00-11:30, Fountain II

Abstract:

The pattern of gene expression in a cell is partly determined by proteins called transcription factors (TFs), which bind short stretches of DNA in the vicinity of genes. These transcription factor binding sites are more difficult to detect with statistical methods than genes, because they are much smaller and exhibit fewer strong biological constraints. Recently, two features of binding sites have been exploited to improve detection: they tend to be conserved across related species, and they tend to appear close together, in what are called cis-regulatory modules.

I will describe a statistical model of aligned DNA sequences, from multiple species, containing shared modules. The model incorporates the tendency of certain TFs' binding sites to appear adjacently in a module. It accounts for evolutionary conservation of binding sites. It also requires no information about the number of TFs acting on the sequences, or what patterns their binding sites follow: these questions are answered by the inference procedure, as part of the model. If something is known in advance about TFs and their sites, as is usually the case, the model makes use of that information in a natural way, while still allowing for new, unknown TFs. The core of the inference involves sampling posterior path trajectories in an infinite-state generalized hidden Markov model; I will explain what this means and how it is done. Results on real multiple alignments, some large, will be presented.


Susan Service (UCLA)
Dense SNP Genotyping on Chromosome 22 in 200 Persons from Each of 12 Populations

Thursday 11:30-12:00, Fountain II

Abstract:

We assessed the distribution and extent of linkage disequilibrium on chromosome 22 in samples of 200 persons from each of eleven population isolates and in an outbred Caucasian sample, using 2486 SNP markers spaced at a density of approximately one marker every 13.8 kb. The spatial pattern of disequilibrium is remarkably consistent in all populations, and correlated with regions of high and low recombination observed in comparing genetic maps to physical maps. The mean level of disequilibrium, however, differs among populations, with several isolates showing substantially higher levels of disequilibrium than outbred Caucasians. This difference is particularly marked in regions of high recombination (low disequilibrium). These results confirm that population isolates may offer advantages for association mapping.