George Mason University
AES/CCS/SCS/Statistics Colloquium Series
Seminar Announcement


Degrees of Boosting -- A Study of Loss Functions for Classification and Class Probability Estimation

Andreas Buja

The Wharton School, Univ. of Pennsylvania

Joint work with Yi Shen (Wharton) and Werner Stuetzle (University of Washington)

Location: SUB II, Room 4
Time: 10:30 a.m. Refreshments, 10:45 a.m. Colloquium Talk
Date: April 22, 2005



ABSTRACT

We examine a vast class of loss functions for binary classification and class probability estimation. The elements of this class are known in subjective probability as ``proper scoring rules''. They comprise all common loss functions such as log-loss, squared error loss, boosting loss derived from the exponential loss, and cost-weighted misclassification losses.

We show that proper scoring rules can be interpreted as mixtures of cost-weighted misclassification losses. This interpretation gives immediate practical insight into loss functions high mass of the mixing measure points to the class probabilities where the proper scoring rule strives for greatest accuracy. For example, log-loss emphasizes probabilities near zero and one, but boosting loss even more so.

All proper scoring rules permit Fisher scoring algorithms, with versions for stagewise fitting a la boosting. The implementation with Iteratively Reweighted LS uses weights that derive from the mixing weights of the mixture decomposition. It follows that a recent algorithm by Hand and Vinciotti (2003) minimizes certain proper scoring rules.

The mixture decomposition of proper scoring rules carries over to information measures such as entropy and the Gini index. Because these measures underlie tree-based classification, every proper scoring rule translates to a new tree splitting criterion.

The mixture decomposition of proper scoring rules also carries over to Bregman distances and can be used to derive inequality bounds for asymptotic theory.