Thumbnail
Access Restriction
Subscribed

Author Hansen, Thomas Dueholm ♦ Miltersen, Peter Bro ♦ Zwick, Uri
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2013
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Markov decision processes ♦ Policy iteration ♦ Strategy iteration ♦ Strongly polynomial algorithms ♦ Turn-based stochastic games
Abstract Ye [2011] showed recently that the simplex method with Dantzig’s pivoting rule, as well as Howard’s policy iteration algorithm, solve discounted Markov decision processes (MDPs), with a constant discount factor, in strongly polynomial time. More precisely, Ye showed that both algorithms terminate after at most $\textit{O}(\textit{mn}1™\textit{γ}$ log $\textit{n}1™\textit{γ})$ iterations, where $\textit{n}$ is the number of states, $\textit{m}$ is the total number of actions in the MDP, and 0 < $\textit{γ}$ < 1 is the discount factor. We improve Ye’s analysis in two respects. First, we improve the bound given by Ye and show that Howard’s policy iteration algorithm actually terminates after at most $\textit{O}(\textit{m}1™\textit{γ}$ log $\textit{n}1™\textit{γ})$ iterations. Second, and more importantly, we show that the same bound applies to the number of iterations performed by the strategy iteration (or strategy improvement) algorithm, a generalization of Howard’s policy iteration algorithm used for solving 2-player turn-based stochastic games with discounted zero-sum rewards. This provides the first strongly polynomial algorithm for solving these games, solving a long standing open problem. Combined with other recent results, this provides a complete characterization of the complexity the standard strategy iteration algorithm for 2-player turn-based stochastic games; it is strongly polynomial for a fixed discount factor, and exponential otherwise.
ISSN 00045411
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2013-02-01
Publisher Place New York
e-ISSN 1557735X
Journal Journal of the ACM (JACM)
Volume Number 60
Issue Number 1
Page Count 16
Starting Page 1
Ending Page 16


Open content in new tab

   Open content in new tab
Source: ACM Digital Library