Hadley Wickham (Iowa State University)
Doina Caragea (Iowa State University)
Di Cook (Iowa State University)
Exploring High-dimensional Classification Boundaries
Friday 4:30-5:00, Fountain I
Abstract:
Given p-dimensional training data containing d groups (the design
space), a classification algorithm (classifier) predicts which group
new data belongs to. Generally the input to these algorithms is high
dimensional, and the boundaries between groups will be high
dimensional and perhaps curvilinear or multi-facted. This paper
discusses methods for understanding the division of space between the
groups, and provides an implementation in an R package,
explore, which links R to GGobi.
If the classifier is mathematically tractable we can extract the
boundaries directly; if the classifier provides posterior
probabilities we can use these to find uncertain points which lie on
boundaries; otherwise we can treat the classifier as a black box and
use a k-nearest neighbours technique to remove non-boundary points.
These techniques allow us to work with any classifier, and we
demonstrate LDA, QDA, SVM, tree and neural net classifiers.