Statistical Aspects of Ranking Search Results (Daryl Pregibon, organizer)
Junghoo Cho (UCLA)
Controlled Randomization of Search Results: Its Implication and Benefit
Saturday 10:30-11:00, Fountain I
Abstract:
In-degree, PageRank, number of visits and other
measures of Web page popularity significantly in-
fluence the ranking of search results by modern
search engines. The assumption is that popularity
is closely correlated with quality, a more elusive
concept that is difficult to measure directly.
Unfortunately, the correlation between popularity
and quality is very weak for newly-created pages
that have yet to receive many visits and/or inlinks.
Worse, since discovery of new content is
largely done by querying search engines, and because
users usually focus their attention on the top
few results, newly-created but high-quality pages
are effectively "shut out", and it can take a very
long time before they become popular.
We propose a simple and elegant solution to
this problem: the introduction of a controlled
amount of randomness into search result ranking
methods. Doing so offers new pages a chance
to prove their worth, although clearly using too
much randomness will degrade result quality and
annul any benefits achieved. Hence there is a
tradeoff between exploration to estimate the quality
of new pages and exploitation of pages already
known to be of high quality. We study this tradeoff
both analytically and via simulation, in the context
of an economic objective function based on
aggregate result quality amortized over time. We
show that a modest amount of randomness leads
to improved search results.
Guy Lebanon (Purdue University)
Conditional Models on the Ranking Poset- A Unifying Theme for Classification and Ranking
Saturday 11:00-11:30, Fountain I
Abstract:
Classification and ranking have long been considered different problems
in machine learning and statistics. We present a class of statistical
models whose event space lies on the ranking poset, a combinatorial
structure that includes classification and ranking, as a unifying
framework for both problems. Popular models for classification such as
error correcting output codes and discrete versions of boosting and
logistic regression as well as ranking models such as the Mallows model
are shown to be special cases in this framework. In addition to drawing
a surprising connection between the above algorithms, the framework
naturally suggests new algorithms for ranking and classification. We
present one such algorithm in the context of meta-search and some ideas
for future research in this direction.
D. Sivakumar (Google, Inc.)
Comparing and Aggregating Partial Rankings
Saturday 11:30-12:00, Fountain I
Abstract:
Motivated by several applications, we consider the problem of
defining distance measures between "top-k rankings" and "rankings
with ties". We propose several distance measures between such
structures, some of which are metrics while others are not. For
each of the latter category, we show that they are "almost" a metric
in two seemingly unrelated senses (which we will also show to be
equivalent). Based on these, we define two distance measures to be
equivalent if they are bounded above and below by constant
multiples of each other. We thereby identify large and robust
equivalence classes of distance metrics for the two types of
ranking structures we study. Our results imply highly-efficient
constant-factor approximation algorithms for the corresponding
"rank aggregation" problems.