Patent · US Active

Systems and methods for learning reusable options to transfer knowledge between tasks

US11511413B2 · kind B2 · utility

0Cited by
0References
19Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 12, 2020
Grant dateNov 29, 2022
Priority date
Expiry dateMay 26, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG05B2219/40499
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A robot that includes an RL agent that is configured to learn a policy to maximize the cumulative reward of a task, to determine one or more features that are minimally correlated with each other. The features are then used as pseudo-rewards, called feature rewards, where each feature reward corresponds to an option policy, or skill, the RL agent learns to maximize. In an example, the RL agent is configured to select the most relevant features to learn respective option policies from. The RL agent is configured to, for each of the selected features, learn the respective option policy that maximizes the respective feature reward. Using the learned option policies, the RL agent is configured to learn a new (second) policy for a new (second) task that can choose from any of the learned option policies or actions available to the RL agent.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.