Online Reinforcement Learning for a Class of Partially Unknown Continuous‐time Nonlinear Systems via Value Iteration
Approximation dynamic programming (ADP), Continuous‐time systems, Integral reinforcement learning (IRL), Online learning, Value iteration
Daniel Felix Ritchie School of Engineering and Computer Science, Electrical and Computer Engineering
In this paper, a modified value iteration–based approximate dynamic programming method is proposed for a class of affine nonlinear continuous‐time systems, whose dynamics are partially unknown. The value iteration algorithm is established in an online fashion, and the convergence proof is given. To attenuate the effect caused by the unascertained characteristics of the system dynamics, the integral reinforcement learning scheme is also used. In the proposed approximate dynamic programming method, it is emphasized that the single‐network structure is utilized to estimate the value functions and the control policies. That is, the iteration process is implemented on the actor/critic structure, in which case only the critic NN is required to be identified. Then, the least‐squares scheme is derived for the NN weights updating. Finally, a linear system and a nonlinear system are tested to evaluate the performance of the proposed online value iteration algorithm. Both of the examples show the feasibility and effectiveness of the proposed algorithms.
Copyright held by author or publisher. User is responsible for all copyright compliance.
Su, Hanguang, et al. “Online Reinforcement Learning for a Class of Partially Unknown Continuous‐Time Nonlinear Systems via Value Iteration.” Optimal Control Applications & Methods, vol. 39, no. 2, 2018, pp. 1011–1028. doi: 10.1002/oca.2391.