Patent · US Active

System and method for policy optimization using quasi-Newton trust region method

US11650551B2 · kind B2 · utility

1Cited by

0References

27Claims

0Family size

Assignee

Mitsubishi Electric Research Laboratories, Inc. · US

Inventors

Devesh Jha · Cambridge, US
Arvind Raghunathan · Medford, US
Diego Romeres · Cambridge, US

Key dates

Filing date	Oct 4, 2019
Grant date	May 16, 2023
Priority date	—
Expiry date	Nov 15, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/006
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A computer-implemented learning method for optimizing a control policy controlling a system is provided. The method includes receiving states of the system being operated for a specific task, initializing the control policy as a function approximator including neural networks, collecting state transition and reward data using a current control policy, estimating an advantage function and a state visitation frequency based on the current control policy, updating the current control policy using the second-order approximation of the objective function, a second-order approximation of the KL-divergence constraint on the permissible change in the policy using a quasi-newton trust region policy optimization, and determining an optimal control policy, for controlling the system, based on the average reward accumulated using the updated current control policy.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.