Statistical Methods Mining, Two Sample Data Analysis,
Comparison Distributions, and Quantile Limit Theorems


Emanuel Parzen
Texas A&M University


ABSTRACT

Predicted to be a multi-billion dollar industry by 2000, the emerging field of "data mining" is a blend of statistics, artificial intelligence, and database research; it seeks to extract information from, and identify models for, data (possibly massive). I propose the name "statistical methods mining" to describe the process of applying the virtual encyclopaedia of statistical knowledge through the use of various frameworks, and maps, of the world of statistical knowledge. In this paper we propose a map (coordinate system) of statistical methods whose aim is to provide a vision of specific methods without learning their details. We use this framework to motivate new methods for modeling two samples, and testing their homogeneity, based on comparison distribution functions, quantile limit theorems, and comparison density functions. Theory is developed for both uncensored and censored data.