Patent · US Active

Task prioritized experience replay algorithm for reinforcement learning

US12277194B2 · kind B2 · utility

0Cited by
0References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 29, 2020
Grant dateApr 15, 2025
Priority date
Expiry dateJun 26, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/00
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A task prioritized experience replay (TaPER) algorithm enables simultaneous learning of multiple RL tasks off policy. The algorithm can prioritize samples that were part of fixed length episodes that led to the achievement of tasks. This enables the agent to quickly learn task policies by bootstrapping over its early successes. Finally, TaPER can improve performance on all tasks simultaneously, which is a desirable characteristic for multi-task RL. Unlike conventional ER algorithms that are applied to single RL task learning settings or that require rewards to be binary or abundant, or are provided as a parameterized specification of goals, TaPER poses no such restrictions and supports arbitrary reward and task specifications.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.