Single-pass, Low-storage Methods for Massive Streaming Datasets with Applications to Multivariate Density Estimation
James P. McDermott, (Bristol-Myers Squibb), james.mcdermott@bms.com, and
Dennis K.J. Lin, (Pennsylvania State University), lin@chao.smeal.psu.edu
Abstract
We propose a single-pass, low-storage sequential method for the execution of multivariate density estimation for massive streaming datasets via convex hull peeling. This new method is shown to vastly reduce the computation time required for the existing convex hull peeling algorithm from O(n2) to O(n). Further, the proposed method uses very low storage as compared to the existing method. We demonstrate the accuracy and reduced computation time required of the porposed method by comparing to the existing convex hull peeling method through simulation studies and a real life example