Hyunsook Lee (Department of Statistics, The Pennsylvania State University)
Detecting Outliers in Multivariate Massive Data by Convex Hull Peeling with Applications
Friday 11:10-11:30, San Marino
Abstract:
Detecting outliers is an important problem in data mining. However,
outliers are not well defined in multivariate massive data. Particularly,
without imposing multivariate normal distribution, outlier detection methods
rarely exist. We found that convex hull peeling asymptotically describes
multivariate data distribution and can be applied to
outlier detection in a nonparametric sense.
In this presentation, we propose some algorithms to detect outliers in massive
data sets. No assumptions are posed except the convexity of data distribution
and no covariance matrix is considered. Additionally, we will show
a modified algorithm for multivariate streaming data.
These algorithms are exemplified with
Monte Carlo simulations and Sloan Digital Sky Survey database.