High Dimensional Clustering Using
Parallel Coordinates and the Grand Tour

By

Edward J. Wegman


and

Qiang Luo

A postscript version of the text of High Dimensional Clustering using Parallel Coordinates and the Grand Tour is available. The figures are available as GIF files below.

    Legends for Figures
  1. Figure 1a. Scatterplot matrix of three clusters in four dimensions.
  2. Figure 1b. Parallel coordinate plot corresponding to the scatterplot matrix in 1.a. Note that a separation along any axis or in between axes is indicative of a cluster. Note also that distinctive slopes of the line segments between pairs of axes also separate clusters.
  3. Figure 2. The scatterplot matrix of 3848 observations on 5 variables from a synthetic dataset about the geometric features of pollen grains. The level sets appear to be elliptical in all five dimensions suggesting a five-dimensional ellipsoidal shape. One might be tempted to guess multivariate Gaussianity.
  4. Figure 3. The fully saturated parallel coordinate plot of the same 3848 observations in five space. The hyperbolic envelope tends to confirm the conclusions about a five dimensional ellipsoidal level set. However, little can be seen from either Figure 2 or Figure 3 about the internal structure of this data.
  5. Figure 4. The desaturated parallel coordinate plot of the 3848 observations this time plotted on a black background. Notice the internal structure and the x-ray like appearance of this density plot.
  6. Figure 5. An intermediate parallel coordinate plot pruned to remove observations away from the internal structure. The plot is rescaled to fill the same scale as in Figure 4.
  7. Figure 6. The final pruned parallel coordinate plot with all observations removed except those corresponding to the internal structure. The plot is again rescaled. The five gaps on axes two and three are suggestive of six clusters.
  8. Figure 7. The result of a grand tour rotation of the data in Figure 6. The rotation confirms that these are six clusters completely separable in at least three of the five dimensions.
  9. Figure 8. The result of plotting the data isolated in the parallel coordinate display back into the scatterplot matrix. It is now apparent that the six clusters for the letters E U R E K A. The six letters are made up of 99 points of the 3848 in the original data set, less than 2.7% of the total observations.