Statistical Aspects of Ranking Search Results (Daryl Pregibon, organizer)


Junghoo Cho (UCLA)
Controlled Randomization of Search Results: Its Implication and Benefit


Saturday 10:30-11:00, Fountain I

Abstract:

In-degree, PageRank, number of visits and other measures of Web page popularity significantly in- fluence the ranking of search results by modern search engines. The assumption is that popularity is closely correlated with quality, a more elusive concept that is difficult to measure directly. Unfortunately, the correlation between popularity and quality is very weak for newly-created pages that have yet to receive many visits and/or inlinks. Worse, since discovery of new content is largely done by querying search engines, and because users usually focus their attention on the top few results, newly-created but high-quality pages are effectively "shut out", and it can take a very long time before they become popular. We propose a simple and elegant solution to this problem: the introduction of a controlled amount of randomness into search result ranking methods. Doing so offers new pages a chance to prove their worth, although clearly using too much randomness will degrade result quality and annul any benefits achieved. Hence there is a tradeoff between exploration to estimate the quality of new pages and exploitation of pages already known to be of high quality. We study this tradeoff both analytically and via simulation, in the context of an economic objective function based on aggregate result quality amortized over time. We show that a modest amount of randomness leads to improved search results.



Guy Lebanon (Purdue University)
Conditional Models on the Ranking Poset- A Unifying Theme for Classification and Ranking


Saturday 11:00-11:30, Fountain I

Abstract:

Classification and ranking have long been considered different problems in machine learning and statistics. We present a class of statistical models whose event space lies on the ranking poset, a combinatorial structure that includes classification and ranking, as a unifying framework for both problems. Popular models for classification such as error correcting output codes and discrete versions of boosting and logistic regression as well as ranking models such as the Mallows model are shown to be special cases in this framework. In addition to drawing a surprising connection between the above algorithms, the framework naturally suggests new algorithms for ranking and classification. We present one such algorithm in the context of meta-search and some ideas for future research in this direction.



D. Sivakumar (Google, Inc.)
Comparing and Aggregating Partial Rankings


Saturday 11:30-12:00, Fountain I

Abstract:

Motivated by several applications, we consider the problem of defining distance measures between "top-k rankings" and "rankings with ties". We propose several distance measures between such structures, some of which are metrics while others are not. For each of the latter category, we show that they are "almost" a metric in two seemingly unrelated senses (which we will also show to be equivalent). Based on these, we define two distance measures to be equivalent if they are bounded above and below by constant multiples of each other. We thereby identify large and robust equivalence classes of distance metrics for the two types of ranking structures we study. Our results imply highly-efficient constant-factor approximation algorithms for the corresponding "rank aggregation" problems.