Abraham Flaxman (Department of Mathematical Sciences Carnegie-Mellon University)
Online Convex Optimization in the Bandit Setting: Gradient Descent without a Gradient
Thursday 3:00-3:20, San Marino
Abstract:
We study a general online convex optimization problem. We have a
convex set S and an unknown sequence of cost functions c_1,c_2,...,
and in each round, we choose a feasible point x_t in S, and learn the
cost c_t(x_t). If the function c_t is also revealed after each round
then, as Zinkevich has shown, an online gradient descent algorithm is
effective even if the functions are chosen adversarially. That is,
after n rounds, the total cost incurred will be O(sqrt(n)) more than
the cost of the best single feasible point chosen with the benefit of
hindsight, min_{x in S} c_1(x)+c_2(x)+...c_n(x). We present this
result and then extend it to the "bandit" setting where, in each
period, only the cost c_t(x_t) is revealed, and bound the expected
regret as O(n^{3/4}).
Our approach uses a simple approximation of the gradient that is
computed from evaluating c_t at a single (random) point. We show that
this biased estimate is sufficient to approximate gradient descent on
the sequence of functions. In other words, it is possible to use
gradient descent in the online setting without seeing anything more
than the value of the functions at a single point.