Patent · US Active

Methods and systems for support policy learning

US11605026B2 · kind B2 · utility

0Cited by

5References

20Claims

0Family size

Assignee

Huawei Technologies Co., Ltd. · CN

Inventors

Daniel Mark Graves · Edmonton, CA
Jun Jin · Shanghai, CN
Jun Luo · Chongqing, CN

Key dates

Filing date	May 15, 2020
Grant date	Mar 14, 2023
Priority date	—
Expiry date	May 27, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/084
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods and systems are described for support policy learning in an agent of a robot. A general value function (GVF) is learned for a main policy, where the GVF represents future performance of the agent executing the main policy for a given state of the environment. A master policy selects an action based on the predicted accumulated success value received from the general value function. When the predicted accumulated success value is an acceptable value, the action selected by the master policy is execution of the main policy. When the predicted accumulated success value is not an acceptable value, the master action causes a support policy to be learned. The support policy generates a support action to be performed which causes the robot to transition from to a new state where the predicted accumulated success value has an acceptable value.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.