Patent · US Active

System and method for policy optimization using quasi-Newton trust region method

US11650551B2 · kind B2 · utility

1Cited by
0References
27Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 4, 2019
Grant dateMay 16, 2023
Priority date
Expiry dateNov 15, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N3/006
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A computer-implemented learning method for optimizing a control policy controlling a system is provided. The method includes receiving states of the system being operated for a specific task, initializing the control policy as a function approximator including neural networks, collecting state transition and reward data using a current control policy, estimating an advantage function and a state visitation frequency based on the current control policy, updating the current control policy using the second-order approximation of the objective function, a second-order approximation of the KL-divergence constraint on the permissible change in the policy using a quasi-newton trust region policy optimization, and determining an optimal control policy, for controlling the system, based on the average reward accumulated using the updated current control policy.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.