Approximate value iteration with complex returns by bounding
US10839302B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 22, 2016 |
| Grant date | Nov 17, 2020 |
| Priority date | — |
| Expiry date | Sep 7, 2039 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY02B10/30
- WIPO fieldControl
- WIPO sectorInstruments
Abstract
A control system and method for controlling a system, which employs a data set representing a plurality of states and associated trajectories of an environment of the system; and which iteratively determines an estimate of an optimal control policy for the system. The iterative process performs the substeps, until convergence, of estimating a long term value for operation at a respective state of the environment over a series of predicted future environmental states; using a complex return of the data set to determine a bound to improve the estimated long term value; and producing an updated estimate of an optimal control policy dependent on the improved estimate of the long term value. The control system may produce an output signal to control the system directly, or output the optimized control policy. The system preferably is a reinforcement learning system which continually improves.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.