Distributed Data, Parallel Computing, and Computational Strategies II
(Tim Hesterberg, chair)


Tin Ho (Statistics and Learning Research, Bell Labs, Lucent Technologies)
Interactive Pattern Discovery with Mirage

Friday 8:30-8:50, San Marino

Abstract:

Mirage is a graphical tool for open-ended pattern discovery that combines human and machine capabilities for correlating observed or simulated data from multiple perspectives and at different depths of analysis. This includes exploratory analysis of raw data as arrays of numbers, text, images, spectra, sound waveforms, numerical features extracted from those, and class structures inferred from such features.

Mirage was developed to address the practical needs in studying the rich context surrounding the core classification tasks in many real-world learning problems. Through highly flexible visual displays and intuitive exploratory operations, it enables domain experts to exercise their judgement at various stages in pattern analysis, and assists analysts in obtaining insights into the data geometry and making critical methodological choices.

I will show the tool's applications in analyzing photonics simulations, evaluating performance of telecommunication systems, and in astronomy. I will also discuss extensions that connect Mirage to external data archives, remote analysis code, custom displays, and dynamical data streams.


J. Patist (Free University of Amsterdam)
W. Kowalczyk (Free University of Amsterdam)
E. Marchiori (Free University of Amsterdam)
Efficient Maintenance of Gaussian Mixture Models over Data Streams

Friday 8:50-9:10, San Marino

Abstract:

This paper addresses the issue of model maintenance for data streams under block evolution with restricted window. At each time a window that consists of a pre-specified number of most recently collected blocks of data is updated through deletion of the oldest block and insertion of a new one. We introduce a method for maintaining a model consisting of a collection of Gaussian densities over such window. The method constructs local Gaussian mixtures on each window's block and iteratively selects and merges pairs of components. We propose two merging strategies and experimentally investigate the accuracy, speed, and memory requirements of the corresponding algorithms. Results of numerous experiments with several artificially generated data sets demonstrate the superiority of our approach over the classical Expectation Maximization method applied to the data of the entire window.



Urmi Ghosh-Dastidar (Mathematics Department, New York City College of Technology, CUNY)
Development of a New Hybrid Optimization Process by Perturbing Parameters Preferentially

Friday 9:10-9:30, San Marino

Abstract:

A global optimization technique based on parameter sensitivity analysis is developed by incorporating tabu strategies in fast simulated annealing (FSA) process. While tabu exploits its memory extensively, FSA, on the other hand, relies only on the information obtained in the previous iteration. This method learns from the past history; searches the space by perturbing parameters once in each iteration; perturbs important parameters more frequently than the less significant parameters to save unnecessary computations.



Oznur Yasar (Department of Mathematics and Statistics, Memorial University of Newfoundland)
Cargi Diner (Department of Mathematics and Statistics, Memorial University of Newfoundland)
Gerhard Weber (Institute of Applied Mathematics, METU (Ankara, Turkey))
Discrete Tomography: How Optimization Methods and Coding Theory Can Help

Friday 9:30-9:50, San Marino

Abstract:

Optimization theory is a key technology for inverse problems of reconstruction with applications in science, technology and economy. Discrete tomography is a modern research field which deals with finite data from VLSI chip design or medical imaging. This paper focusses on the utilization of modern optimization methods to approximately resolve the NP-hard reconstruction problem of discrete tomography. Our new approaches and introductions are based on modeling and algorithms from coding theory.