Thumbnail
Access Restriction
Subscribed

Author Ngo Anh Vien ♦ TaeChoong Chung
Source IEEE Xplore Digital Library
Content type Text
Publisher Institute of Electrical and Electronics Engineers, Inc. (IEEE)
File Format PDF
Copyright Year ©2008
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Special computer methods
Subject Keyword Function approximation ♦ Computational modeling ♦ Stochastic processes ♦ Artificial intelligence ♦ Laboratories ♦ Convergence ♦ Dynamic programming ♦ Costs ♦ Approximation algorithms ♦ Optimization methods
Abstract This paper proposes a simulation-based algorithm for optimizing the average reward in a parameterized continuous-time, finite-state semi-Markov decision process (SMDP). Our contributions are twofold: First, we compute the approximate gradient of the average reward with respect to the parameters in SMDP controlled by parameterized stochastic policies. Then stochastic gradient ascent method is used to adjust the parameters in order to optimize the average reward. Second, we present a simulation-based algorithm to estimate the approximate average gradient of the average reward (GSMDP), using only single sample path of the underlying Markov chain. We prove the almost sure convergence of this estimate to the true gradient of the average reward when the number of iterations goes to infinity.
Description Author affiliation: Artificial Intell. Lab., Kyung Hee Univ., Yongin (Ngo Anh Vien; TaeChoong Chung)
ISBN 9780769534404
ISSN 10823409
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research ♦ Reading
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2008-11-03
Publisher Place USA
Rights Holder Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Size (in Bytes) 326.47 kB
Page Count 8
Starting Page 11
Ending Page 18


Source: IEEE Xplore Digital Library