The Center for Computational Statistics



Introduction

The Center for Computational Statistics is an interdisciplinary research center focused on the interface between computing science and statistics. The Center was established as an official unit of George Mason University in July of 1986. It was established within the School of Information Technology and Engineering and was, in fact, the earliest of the major research centers to be established in the School. The intellectual premise of the Center grows from the conviction that the computing revolution begun in the decade of the 1980s must have a profound effect on the development of statistical methodology and techniques. Indeed, statistical science is a fundamentally computationally-oriented discipline and, thus, is inevitably affected by the changes in computational resources. The premise of the Center, however, is that the effect is not merely evolutionary, but, in fact, revolutionary. The nature of how this premise has been translated into a research protocol is outlined below.

Center Research Initiatives

If the premise of the Center is based on the impact of the computing revolution on statistical science, then the Center's primary research initiative is correspondingly ambitious. The computer revolution of the 1980s has several major features including dramatic increases in computational speed, greatly increased memory capabilities, an essentially unprecedented capability for graphics and visualization, many orders of magnitude improvement in the capabilities for electronic data acquisition and transfer and, most importantly, the placement of powerful computing resources in the hands of individuals. This latter feature is particularly important because of its role in stimulating computationally based experiments. The implication for the interface between statistics and computing is profound. The statistician is faced with data sets which are far more massive in both size and dimensionality that he or she would have even dared to conceive 15 years ago. The very scale of these new data sets implies a lack of homogeneity in the data and a violation of other model assumptions that obsoletes much of the time-honored methodology developed in traditional statistics over the last century. This is, indeed, an exciting time for those statisticians and computer scientists who face up to these intellectual challenges. The Center for Computational Statistics has as the major intellectual initiative the development of fundamentally new methodologies for dealing with massive large, high-dimensional, non-homogeneous data sets. These methodologies conceived to fall into two broad categories: 1) visualization tools for graphically understanding the structure of such massive data sets and 2) analytical tools which recognize the non-traditional characteristics of such data sets and which focus on the computationally-intensive methods that such characteristics imply. These two categories have come to be jointly called Computational Statistics and have, largely because of the intellectual successes based in the Center and elsewhere, become recognized as a distinct research area within the world statistical community.

Computationally intensive methods require extensive computing power while visualization techniques require high performance graphics capabilities. Because of these basic facts, a substantial focus has been placed on the acquisition of appropriate computational resources within the Center. The Center's major high performance computing resources are the Intel Paragon XP/S A4 and the Intel iPSC/2 d4/VX concurrent computers. Because of the parallel nature of these machines, a second major initiative has been able to be developed, Stochastic Methods for Parallel Computing. The Center's work in this area is also quite unique since the majority of computer scientists working on parallel computing paradigms do not have a background in probabilistic and statistical methods. Hence, although there is a fundamental stochastic character to computation on a parallel or distributed computer, traditional computer scientists often account for this in a comparatively naive way. The Center's thrust in stochastic methods is at the cutting edge of approaches to parallel computing. Particular projects involve automatic parallelization and load balancing on distributed-memory concurrent machines, stochastic decomposition of algorithms in order to enhance both speedup and fault tolerance, and stochastic load-balancing protocols for networks of workstations.

The interface between computing science and statistics is a multifaceted interface. While the above initiatives indicate major intellectual undertakings, many collateral initiatives have been undertaken as well. These flow partly from the presence of considerable computational resources, which make certain undertakings feasible, and partly from the intellectual ferment caused by the cross-disciplinary orientation. Briefly, some of the intellectual initiatives involve: 1) stochastic modeling of neural networks, 2) statistical methods for transient signal processing, 3) computer performance evaluation, 4) geographic information systems and spatial data analysis, 5) computationally-based polling, and survey design and analysis, and 6) stereoscopic rendering and visualization.


<-- Return to the Center's Home Page

Interdisciplinary Character

The Center for Computational Statistics was conceived at the outset as an interdisciplinary research activity. Originally, computer science and statistics were the two traditional disciplines from which the Center was to have emerged. The Center's activities in graphical representation of data quickly involved aspects of three-dimensional rendering and color. For example, what is the best way to render graphical displays of statistical data in terms of both color palette and background rendering? What is the impact of perspective in rendering a three-dimensional data display particularly when the data display is not in any sense a natural scene? These and related issues quickly led the Center personnel to an involvement with faculty members in the Visual Information Technologies Program. Similarly, common interest in coping with large scale statistical databases and visualization leads to natural applications in the area of geographic information systems. Such interests led to a very productive collaboration between Center faculty and members of the Transportation Center in the development of their joint Tyson's Corner Mapping Project.

Other natural scholarly interactions have arisen in the biological sciences, political science and public policy, sociology, electrical engineering and mathematics to name just a few areas. Perhaps one of the most important interactions has been in the generic area of computational sciences and informatics. The idea that advanced computational resources can not only act as vehicles for translating existing methods to a computing framework, but also stimulate the creation of completely new methodology which would not be possible without the computing framework is a major theme of the activity in the Center for Computational Statistics. The University's Institute for Computational Sciences and Informatics has been built around the same theme and the Center has been, at least partially, a successful model for this larger scale enterprise. The core curriculum of the Ph.D. in Computational Sciences and Informatics is designed around this same theme. One of the major tracks, Computational Statistics, embodies our Center's philosophy and, in fact, even the Institute's model for computational facilities is an evolution of the Center's model. The Center is a proactive advocate of interdisciplinary work and has to its credit literally dozens of examples of interdisciplinary involvement.


<-- Return to the Center's Home Page

Impact of the Center on and beyond the GMU Campus

The Center has a positive impact in several important ways: 1) as a intellectual stimulus, 2) as an interdisciplinary forum, and 3) as a resource to the GMU campus and broader academic community. As indicated earlier, there is a professional acceptance of the phrase, computational statistics, as an area of research. The Center for Computational Statistics has been a primary source of stimulus for the concept that there are methods and techniques in statistics which would not be possible outside the framework of modern computing. This is the essence of computational statistics. The usage of this phrase has become widespread in the last ten years. There are now several journals which incorporate this language including Computational Statistics and Data Analysis, Computational Statistics, and the Journal of Computational and Graphical Statistics. The Japanese have established a Society for Computational Statistics and the American Statistical Association has established in its official literature a research category called "computational statistics." As witness to the impact of the Center in the intellectual ferment associated with development in this area, faculty members affiliated with the Center have served as the editor of the ASA's Statistical Computing and Graphics Newsletter (D. Carr), as the co-program chairs for methodology for the Winter Simulation Conference (D. Gantz and D. Miller), as the vice-president of the national Intel Supercomputer Users Group (E. Wegman), as the editor of the most widely circulated statistics journal in the world, the Journal of the American Statistical Association (E. Wegman), and most recently as President-elect of the IASC(E. Wegman).

The Center and the University were the official hosts of the 1988 Symposium on the Interface of Computing Science and Statistics. The Interface Symposia are an old distinguished series of meetings; 1993 was the 25th jubilee meeting. Prior to 1987, the Interface Symposia were loosely organized and in danger of succumbing to their own success. The symposia were becoming large enough that the then current management infrastructure was inadequate. In August, 1987, the Interface Foundation of North America, Inc., a non-profit educational Virginia corporation, was founded by personnel from the Center. The Center remains the administrative headquarters of the Interface Foundation. The Interface Foundation is one of three co-sponsors of the Journal of Computational and Graphical Statistics so that the Center is likely to continue to play a very high profile intellectual role in the national and international arena for the foreseeable future.

The Center for Computational Statistics has been a resource for many scholars on the GMU campus. One reason for this has been the excellence of the computing facilities. The specific equipments are listed in a companion brochure. Just as a matter of interest, however, it is worth mentioning that the Center's Intel concurrent computers are the fastest number crunching machines on campus while the Center's Silicon Graphics Onyx RE2, Crimson VGXT, and Indigo Elan are the highest performance graphics workstations on campus. The Center's computing facilities contain more than 17.5 gigabytes of hard disc storage capability. The Center's microcomputer software collection is very substantial containing more than 100 applications. The Center's Intel concurrent computers, the Silicon Graphics workstations and the Hewlett-Packard file, print and compute servers are all on Masonet and are available to authorized faculty and students in the university community. These resources have been extensively used particularly by graduate students from other discipline areas.


<-- Return to the Center's Home Page

Center Faculty

As with most Centers on the George Mason campus, faculty normally have an academic appointment in some home department. Because of the interdisciplinary nature of the Center, the Center has affiliates from a wide variety of academic backgrounds. Summarized below are the George Mason University faculty who are associated with the Center. Only those who maintain an active research interaction are included. There are a number of others who might be described more accurately as interested supporters. Because of the nature of the Center's activities, there are several researchers with whom we collaborate actively and who are professionally affiliated with other institutions. These researchers have formally been appointed as Corresponding Faculty. This appointment entails the expectation of serious interaction with faculty and students in the Center. Indeed, three members who were originally appointed as corresponding faculty have become regular faculty members: Dr. Daniel Carr, Dr. Muhammad Habib, and Dr. James Gentle.

Regular Research Faculty in the Center


A. Richard Bolstein


Associate Professor of Applied Statistics
Polling, survey sampling, functional analytic methods in statistics

Daniel B. Carr


Professor of Applied Statistics
Statistical graphics for data analysis, life and physical science applications of statistics

Jim X. Chen


Assistant Professor of Computer Science
Computer graphics, physically-based modeling, real-time simulation, distributed interactive simulation, scientific visualization, information retrieval, artificial intelligence, and multimedia

Thomas M. Dietz


Associate Professor of Sociology
Survey sampling, statistical methods in sociology

Donald T. Gantz


Professor of Applied Statistics
Statistical computing, data analysis, simulation, geographic information systems

James Gentle


University Professor of Computational Statistics
Statistical computing, statistical software, applied statistics, simulation

Irwin Greenberg


Professor of Operations Research and Engineering
Reliability, Quality Control and quality assurance, statistical methods in operations research

Gregory A. Guagnano


Assistant Professor of Sociology
Statistical methods in sociology, survey research and survey design

Muhammad K. Habib


Associate Professor of Applied Statistics
Stochastic models of neural networks, statistical communication theory

John J. Miller


Associate Professor of Applied Statistics
Multivariate statistical methods, regression analysis, signal processing, applied statistics

Stephen Nash


Associate Professor of Operations Research and Engineering
Numerical linear algebra, optimization, parallel computing

Ariela Sofer


Associate Professor of Operations Research and Engineering
Optimization, numerical methods, software reliability, parallel computing

Arun Sood


Professor of Computer Science
Image processing, parallel computing

Mark Spikell


Professor of Curriculum and Instruction
Mathematical education, quantitative literacy in secondary education

Daniele C. Struppa


Professor of Mathematics
Real and complex analysis, applied mathematics, computational mathematics

Clifton Sutton


Associate Professor of Applied Statistics
Geometric probability, computationally intensive methods of data analysis

David F. Walnut


Assistant Professor of Mathematics
Wavelet methods, functional analysis

Edward J. Wegman


The Bernard J. Dunn Professor of Information Technology and Applied Statistics
Computational statistics, nonparametric functional inference, parallel computing

Adalbert F. X. Wilhelm


Visiting Professor of Computational Statistics
Statistical Graphics, Missing Values, Statistical Software

Corresponding Research Faculty in the Center


Linda J. Davis


TRW, Inc.
Regression analysis, exploratory data analysis, computer performance evaluation

Hung Tri Le


Fannie Mae
Transient signal analysis, stochastic processes, nonparametric function estimation

Carey E. Priebe


The John Hopkins University
Computational statistics, nonparametric density estimation, adaptive methods, pattern recognition, medical applications

Enders Anthony Robinson


Columbia University
Time series analysis, explorational geophysics, geophysical engineering

George Rogers


Naval Surface Warfare Center, Dahlgren
Computational statistics, pattern recognition, applications to military and health sciences, physics and statistical applications to physics

David W. Scott


Rice University, Department of Statistics
Statistical graphics, nonparametric function estimation, biometry, medical applications

Nozer Singpurwalla


George Washington University, Department of Operations Research
Bayesian statistics, reliability and quality control, software reliability

A. E. R. Woodcock


Synectics Corporation, Chief Scientist and Vice President
Nonlinear dynamical systems, chaos, military applications, biological applications

<-- Return to the Center's Home Page

Research and Scholarship

The major product of any research center is, of course, the creation and exploitation of new knowledge. A measure of the productivity is the publication record, but it must be emphasized that this is only one measure. The Center has established a Technical Report Series. As a companion to the Technical Report Series, there is a Technical Seminar Series which publishes course materials from the Center's technical short courses. Yet another productivity measure is the production of graduate students. We currently have more than 20 students who have been formally admitted to candidacy in the Ph.D. programs. Dr. Hung Tri Le one of the original seven awardees of the Ph.D. in Information Technology, was a student affiliated with the Center and, since then, the Center has produced more than fourteen doctorates. Another measure is the level of research funding. The Center has contracts or grants from the National Science Foundation, the U.S. Navy, the Army Research Office, the Department of Commerce, the National Security Agency, the Environmental Protection Agency and the National Agricultural Statistics Service. Equipment grants have been received from the Air Force Office of Scientific Research, the Army Research Office, the National Science Foundation, IBM, Intel Scientific Computers, and Apple Computers. The average annual level of funding in the Center has been approximately $700,000 most of which is competitively awarded, basic research funding. Particularly emphasized is the basic research nature of the Center's activities because the belief is that this represents a special achievement of the Center. The Center exhibits an excellent, perhaps unique, balance between adhering to the fundamental intellectual tenets of a traditional university in focusing the creation of new knowledge through basic research, and the entrepreneurial spirit of the contemporary, involved university responsive to the local and regional community by providing research and development products which effect the transfer of technology to the broader commercial, industrial and governmental communities. The focus on the former, of course, is critical toward building the reputation of George Mason as a nationally ranked research university, while the latter is a necessary element in building the regional financial and political support for George Mason. As examples of the latter, personnel in the Center have produced several short courses, four books and several software packages over the last few years. The software packages include Mason Hypergraphics (a PC-based and a Silicon Graphics based data analysis package for multidimensional data), ExplorN (a statistical visualization package for the Silicon Graphics machines), and Mason Ridge (a high-interaction structural inference package developed on the Silicon Graphics workstation). One other example is the Tysons Corner map. It is a commercial product that was produced by the Transportation Center in conjunction with the Center for Computational Statistics. The final camera-ready drafting of this map was done on equipment owned by the Center for Computational Statistics.

Most recently faculty members of the Center have received two major research awards. Professor Daniel Carr is principal investigator of a cooperative agreement with the Environmental Protection Agency. This award carries a value of $2.4 million. Professor Edward J. Wegman won an award from the U.S. Department of Defense for the purchase of an Intel Paragon parallel supercomputer. The overall value of this contract is $1.15 million.


<-- Return to the Center's Home Page

For further information, contact

Professor Edward J. Wegman, Director
Center for Computational Statistics
MS 4A7, 157 Science-Technology Building 2
George Mason University
Fairfax, VA 22030

Ph: (703) 993-1680
FAX: (703) 993-1700

Email: compstat@galaxy.gmu.edu