Multi-protein Complex Data Clustering for Detecting Protein Interactions and Functional Organizations
Chris Ding, (Lawrence Berkeley National Laboratory), chqding@lbl.gov,
Xiaofeng He, (Lawrence Berkeley National Laboratory), xhe@lbl.gov,
Richard Meraz, (Lawrence Berkeley National laboratory), RFMeraz@lbl.gov, and
Steve Holbrook, (Lawrence Berkeley National Laboratory), SRHolbrook@lbl.gov
Abstract
Protein Interaction Networks present a useful perspective for understanding cellular processes. Recent experiments employing high-throughput mass spectrometric characterizations have resulted in large datasets of physiologically relevant multi-protein complexes. We present a unified representation of such datasets based on an underlying bipartite graph model that present an advance over existing models of the network. This representation automatically generate protein - protein interaction network and also the protein complex - protein complex association network. Our unified representation allows for weighting of connections between proteins shared in more than one complex as well as addressing the higher level of organization that occurs when the network is viewed as consisting of protein complexes that share components. This representation also allows for the application of the rigorous spectral graph clustering algorithm for the determination of relevant protein modules in the networks. Statistically significant annotations of clusters in the protein-protein and complex-complex network using concepts from the Gene Ontology suggest that this method is also useful for detecting uncharacterized components of protein complexes or uncharacterized relationships between protein complexes.