Patent · US Active

Approximate value iteration with complex returns by bounding

US10839302B2 · kind B2 · utility

5Cited by

373References

20Claims

0Family size

Assignee

The Research Foundation for The State University of New York · US

Inventors

Robert J. Wright · North Palm Beach, US
Lei Yu · Huangguan, CN
Steven Loscalzo · Vienna, US

Key dates

Filing date	Nov 22, 2016
Grant date	Nov 17, 2020
Priority date	—
Expiry date	Sep 7, 2039

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY02B10/30
WIPO fieldControl
WIPO sectorInstruments

Abstract

A control system and method for controlling a system, which employs a data set representing a plurality of states and associated trajectories of an environment of the system; and which iteratively determines an estimate of an optimal control policy for the system. The iterative process performs the substeps, until convergence, of estimating a long term value for operation at a respective state of the environment over a series of predicted future environmental states; using a complex return of the data set to determine a bound to improve the estimated long term value; and producing an updated estimate of an optimal control policy dependent on the improved estimate of the long term value. The control system may produce an output signal to control the system directly, or output the optimized control policy. The system preferably is a reinforcement learning system which continually improves.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.