### Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount FactorStrategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor

Access Restriction
Subscribed

 Author Hansen, Thomas Dueholm ♦ Miltersen, Peter Bro ♦ Zwick, Uri Source ACM Digital Library Content type Text Publisher Association for Computing Machinery (ACM) File Format PDF Copyright Year ©2013 Language English
 Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science Subject Keyword Markov decision processes ♦ Policy iteration ♦ Strategy iteration ♦ Strongly polynomial algorithms ♦ Turn-based stochastic games Abstract Ye [2011] showed recently that the simplex method with Dantzig’s pivoting rule, as well as Howard’s policy iteration algorithm, solve discounted Markov decision processes (MDPs), with a constant discount factor, in strongly polynomial time. More precisely, Ye showed that both algorithms terminate after at most $\textit{O}(\textit{mn}1™\textit{γ}$ log $\textit{n}1™\textit{γ})$ iterations, where $\textit{n}$ is the number of states, $\textit{m}$ is the total number of actions in the MDP, and 0 < $\textit{γ}$ < 1 is the discount factor. We improve Ye’s analysis in two respects. First, we improve the bound given by Ye and show that Howard’s policy iteration algorithm actually terminates after at most $\textit{O}(\textit{m}1™\textit{γ}$ log $\textit{n}1™\textit{γ})$ iterations. Second, and more importantly, we show that the same bound applies to the number of iterations performed by the strategy iteration (or strategy improvement) algorithm, a generalization of Howard’s policy iteration algorithm used for solving 2-player turn-based stochastic games with discounted zero-sum rewards. This provides the first strongly polynomial algorithm for solving these games, solving a long standing open problem. Combined with other recent results, this provides a complete characterization of the complexity the standard strategy iteration algorithm for 2-player turn-based stochastic games; it is strongly polynomial for a fixed discount factor, and exponential otherwise. ISSN 00045411 Age Range 18 to 22 years ♦ above 22 year Educational Use Research Education Level UG and PG Learning Resource Type Article Publisher Date 2013-02-01 Publisher Place New York e-ISSN 1557735X Journal Journal of the ACM (JACM) Volume Number 60 Issue Number 1 Page Count 16 Starting Page 1 Ending Page 16

#### Open content in new tab

Source: ACM Digital Library