摘要
Weconsidertheclassicalpolicyiterationmethodofdynamicprogramming(DP),whereapproximationsandsimulationareusedtodealwiththecurseofdimensionality.Wesurveyanumberofissues:convergenceandrateofconvergenceofapproximatepolicyevaluationmethods,singularityandsusceptibilitytosimulationnoiseofpolicyevaluation,explorationissues,constrainedandenhancedpolicyiteration,policyoscillationandchattering,andoptimisticanddistributedpolicyiteration.Ourdiscussionofpolicyeva...
出版日期
2011年03月13日(中国期刊网平台首次上网日期,不代表论文的发表时间)