We examine a vast class of loss functions for binary classification and class probability estimation. The elements of this class are known in subjective probability as ``proper scoring rules''. They comprise all common loss functions such as log-loss, squared error loss, boosting loss derived from the exponential loss, and cost-weighted misclassification losses.
We show that proper scoring rules can be interpreted as mixtures of cost-weighted misclassification losses. This interpretation gives immediate practical insight into loss functions high mass of the mixing measure points to the class probabilities where the proper scoring rule strives for greatest accuracy. For example, log-loss emphasizes probabilities near zero and one, but boosting loss even more so.
All proper scoring rules permit Fisher scoring algorithms, with versions for stagewise fitting a la boosting. The implementation with Iteratively Reweighted LS uses weights that derive from the mixing weights of the mixture decomposition. It follows that a recent algorithm by Hand and Vinciotti (2003) minimizes certain proper scoring rules.
The mixture decomposition of proper scoring rules carries over to information measures such as entropy and the Gini index. Because these measures underlie tree-based classification, every proper scoring rule translates to a new tree splitting criterion.
The mixture decomposition of proper scoring rules also carries over to Bregman distances and can be used to derive inequality bounds for asymptotic theory.