George Mason University
AES/CCS/SCS/Statistics Colloquium Series
Seminar Announcement


Clustering, Subspace Clustering and Clustering Ensembles

Carlotta Domeniconi

Department of Information and Software Engineering
George Mason University


Innovation Hall, Room 136, Fairfax Campus
George Mason University, 4400 University Drive, Fairfax, VA 22030

Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
Date: April 14, 2006



ABSTRACT

Clustering suffers from the curse of dimensionality, and similarity functions that use all input features with equal relevance may not be effective. In the first part of the talk we introduce an algorithm that discovers clusters in subspaces spanned by different combinations of dimensions via local weightings of features. Our approach avoids the risk of loss of information encountered in global dimensionality reduction techniques, and does not assume any data distribution model. Our method associates to each cluster a weight vector, whose values capture the relevance of features within the corresponding cluster. We experimentally demonstrate the gain in performance our method achieves with respect to competitive methods. In particular, we apply our technique to clustering of documents, where cluster-dependent keywords are also identified via the continuous term-weighting mechanism.

In the second part of the talk we investigate the sensitivity of subspace clustering to input parameters, and propose a clustering ensemble approach to solve this problem. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. Experimental results show that our ensemble techniques are capable of producing a partition that is as good or better than the best individual clustering.