Patent · US Active

Systems and methods for learning reusable options to transfer knowledge between tasks

US11511413B2 · kind B2 · utility

0Cited by

0References

19Claims

0Family size

Assignee

Huawei Technologies Co., Ltd. · CN

Inventors

Borislav MAVRIN · York, CA
Daniel Mark Graves · Edmonton, CA

Key dates

Filing date	Jun 12, 2020
Grant date	Nov 29, 2022
Priority date	—
Expiry date	May 26, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG05B2219/40499
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A robot that includes an RL agent that is configured to learn a policy to maximize the cumulative reward of a task, to determine one or more features that are minimally correlated with each other. The features are then used as pseudo-rewards, called feature rewards, where each feature reward corresponds to an option policy, or skill, the RL agent learns to maximize. In an example, the RL agent is configured to select the most relevant features to learn respective option policies from. The RL agent is configured to, for each of the selected features, learn the respective option policy that maximizes the respective feature reward. Using the learned option policies, the RL agent is configured to learn a new (second) policy for a new (second) task that can choose from any of the learned option policies or actions available to the RL agent.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.