Title:
Efficient pac-learning for episodic tasks with acyclic state spaces and the optimal node visitation problem in acyclic stochastic digaphs.

dc.contributor.advisor Reveliotis, Spyros
dc.contributor.author Bountourelis, Theologos en_US
dc.contributor.committeeMember Ayhan, Hayriye
dc.contributor.committeeMember Goldsman, David
dc.contributor.committeeMember Shamma, Jeff
dc.contributor.committeeMember Zwart, Bert
dc.contributor.department Industrial and Systems Engineering en_US
dc.date.accessioned 2009-06-08T19:02:56Z
dc.date.available 2009-06-08T19:02:56Z
dc.date.issued 2008-12-19 en_US
dc.description.abstract The first part of this research program concerns the development of customized and easily implementable Probably Approximately Correct (PAC)-learning algorithms for episodic tasks over acyclic state spaces. The defining characteristic of our algorithms is that they take explicitly into consideration the acyclic structure of the underlying state space and the episodic nature of the considered learning task. The first of the above two attributes enables a very straightforward and efficient resolution of the ``exploration vs exploitation' dilemma, while the second provides a natural regenerating mechanism that is instrumental in the dynamics of our algorithms. Some additional characteristics that distinguish our algorithms from those developed in the past literature are (i) their direct nature, that eliminates the need of a complete specification of the underlying MDP model and reduces their execution to a very simple computation, and (ii) the unique emphasis that they place in the efficient implementation of the sampling process that is defined by their PAC property. More specifically, the aforementioned PAC-learning algorithms complete their learning task by implementing a systematic episodic sampling schedule on the underlying acyclic state space. This sampling schedule combined with the stochastic nature of the transitions taking place, define the need for efficient routing policies that will help the algorithms complete their exploration program while minimizing, in expectation, the number of executed episodes. The design of an optimal policy that will satisfy a specified pattern of arc visitation requirements in an acyclic stochastic graph, while minimizing the expected number of required episodes, is a challenging problem, even under the assumption that all the branching probabilities involved are known a priori. Hence, the sampling process that takes place in the proposed PAC-learning algorithms gives rise to a novel, very interesting stochastic control/scheduling problem, that is characterized as the problem of the Optimal Node Visitation (ONV) in acyclic stochastic digraphs. The second part of the work presented herein seeks the systematic modelling and analysis of the ONV problem. The last part of this research program explores the computational merits obtained by heuristical implementations that result from the integration of the ONV problem developments into the PAC-algorithms developed in the first part of this work. We study, through numerical experimentation, the relative performance of these resulting heuristical implementations in comparison to (i) the initial version of the PAC-learning algorithms, presented in the first part of the research program, and (ii) standard Q-learning algorithm variations provided in the RL literature. The work presented in this last part reinforces and confirms the driving assumption of this research, i.e., that one can design customized RL algorithms of enhanced performance if the underlying problem structure is taken into account. en_US
dc.description.degree Ph.D. en_US
dc.identifier.uri http://hdl.handle.net/1853/28144
dc.publisher Georgia Institute of Technology en_US
dc.subject Computational complexity en_US
dc.subject Stochastic control en_US
dc.subject Approximate dynamic programming en_US
dc.subject Dynamic programming en_US
dc.subject PAC learning en_US
dc.subject Scheduling en_US
dc.subject Fluid relaxation en_US
dc.subject Reinforcement learning en_US
dc.subject.lcsh Machine learning
dc.subject.lcsh Stochastic control theory
dc.subject.lcsh Algorithms
dc.title Efficient pac-learning for episodic tasks with acyclic state spaces and the optimal node visitation problem in acyclic stochastic digaphs. en_US
dc.type Text
dc.type.genre Dissertation
dspace.entity.type Publication
local.contributor.advisor Reveliotis, Spyros
local.contributor.corporatename H. Milton Stewart School of Industrial and Systems Engineering
local.contributor.corporatename College of Engineering
relation.isAdvisorOfPublication cf1d2a57-c911-4bcd-a21c-13f21e17430b
relation.isOrgUnitOfPublication 29ad75f0-242d-49a7-9b3d-0ac88893323c
relation.isOrgUnitOfPublication 7c022d60-21d5-497c-b552-95e489a06569
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
theologos_bountourelis_200905_phd.pdf
Size:
1.17 MB
Format:
Adobe Portable Document Format
Description: