Patent · US Active

Approximate value iteration with complex returns by bounding

US10839302B2 · kind B2 · utility

5Cited by
373References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 22, 2016
Grant dateNov 17, 2020
Priority date
Expiry dateSep 7, 2039

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY02B10/30
  • WIPO fieldControl
  • WIPO sectorInstruments

Abstract

A control system and method for controlling a system, which employs a data set representing a plurality of states and associated trajectories of an environment of the system; and which iteratively determines an estimate of an optimal control policy for the system. The iterative process performs the substeps, until convergence, of estimating a long term value for operation at a respective state of the environment over a series of predicted future environmental states; using a complex return of the data set to determine a bound to improve the estimated long term value; and producing an updated estimate of an optimal control policy dependent on the improved estimate of the long term value. The control system may produce an output signal to control the system directly, or output the optimized control policy. The system preferably is a reinforcement learning system which continually improves.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.