Shie Mannor (Technion)

Abstract:

We are interested in finding optimal control policies in competitive dynamic environments. The model in question is comprised from a single decision maker who faces a competitive dynamic environment, with interacting multiple agents. The decision making agent does not assume any strategic structure on the other participating agents. The agent wishes to take advantage of the other agents deviations from an adversarial (min-max) play, and obtain an average reward which is higher than the worst case performance level (the "value" of the zero-sum game) in that case. If the other agents are playing an adversarial strategy, the agent sets out to guarantee that his average reward is asymptotically not worse than the worst case performance level.

We propose two solution concepts for this problem which are based on either prediction with expert advice or approachability theory for stochastic games. We show how those solutions can be attained when the environment is known. We will also briefly discuss the case where the environment (transition probabilities and rewards) is unknown, and a learning scheme is required. A unified learning scheme that attains the solution concepts against any strategy of the other agents will be outlined. In the process of developing this scheme, a multi-criteria Reinforcement Learning will be sketched.