Task prioritized experience replay algorithm for reinforcement learning
US12277194B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 29, 2020 |
| Grant date | Apr 15, 2025 |
| Priority date | — |
| Expiry date | Jun 26, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A task prioritized experience replay (TaPER) algorithm enables simultaneous learning of multiple RL tasks off policy. The algorithm can prioritize samples that were part of fixed length episodes that led to the achievement of tasks. This enables the agent to quickly learn task policies by bootstrapping over its early successes. Finally, TaPER can improve performance on all tasks simultaneously, which is a desirable characteristic for multi-task RL. Unlike conventional ER algorithms that are applied to single RL task learning settings or that require rewards to be binary or abundant, or are provided as a parameterized specification of goals, TaPER poses no such restrictions and supports arbitrary reward and task specifications.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.