Clustering suffers from the curse of dimensionality, and similarity functions that use all input features with equal relevance may not be effective. In the first part of the talk we introduce an algorithm that discovers clusters in subspaces spanned by different combinations of dimensions via local weightings of features. Our approach avoids the risk of loss of information encountered in global dimensionality reduction techniques, and does not assume any data distribution model. Our method associates to each cluster a weight vector, whose values capture the relevance of features within the corresponding cluster. We experimentally demonstrate the gain in performance our method achieves with respect to competitive methods. In particular, we apply our technique to clustering of documents, where cluster-dependent keywords are also identified via the continuous term-weighting mechanism.
In the second part of the talk we investigate the sensitivity of subspace clustering to input parameters, and propose a clustering ensemble approach to solve this problem. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. Experimental results show that our ensemble techniques are capable of producing a partition that is as good or better than the best individual clustering.