High Dimensional Clustering Using
Parallel Coordinates
and the Grand Tour
By
Edward J. Wegman
and
Qiang Luo
A postscript version of the text of High Dimensional
Clustering using Parallel Coordinates and the Grand Tour is available.
The figures are available as GIF files below.
Legends for Figures
- Figure 1a. Scatterplot matrix of three
clusters in four dimensions.
- Figure 1b. Parallel coordinate plot
corresponding to the scatterplot matrix in 1.a. Note that a separation along
any axis or in between axes is indicative of a cluster. Note also that
distinctive slopes of the line segments between pairs of axes also separate
clusters.
- Figure 2. The scatterplot matrix of 3848
observations on 5 variables from a synthetic dataset about the geometric
features of pollen grains. The level sets appear to be elliptical in all five
dimensions suggesting a five-dimensional ellipsoidal shape. One might be
tempted to guess multivariate Gaussianity.
- Figure 3. The fully saturated parallel
coordinate plot of the same 3848 observations in five space. The hyperbolic
envelope tends to confirm the conclusions about a five dimensional ellipsoidal
level set. However, little can be seen from either Figure 2 or Figure 3 about
the internal structure of this data.
- Figure 4. The desaturated parallel coordinate
plot of the 3848 observations this time plotted on a black background. Notice
the internal structure and the x-ray like appearance of this density plot.
- Figure 5. An intermediate parallel coordinate
plot pruned to remove observations away from the internal structure. The plot
is rescaled to fill the same scale as in Figure 4.
- Figure 6. The final pruned parallel coordinate
plot with all observations removed except those corresponding to the internal
structure. The plot is again rescaled. The five gaps on axes two and three
are suggestive of six clusters.
- Figure 7. The result of a grand tour rotation
of the data in Figure 6. The rotation confirms that these are six clusters
completely separable in at least three of the five dimensions.
- Figure 8. The result of plotting the data
isolated in the parallel coordinate display back into the scatterplot matrix.
It is now apparent that the six clusters for the letters E U R E K A. The six
letters are made up of 99 points of the 3848 in the original data set, less
than 2.7% of the total observations.