Distributed training using actor-critic reinforcement learning with off-policy correction factors
US11593646B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 5, 2019 |
| Grant date | Feb 28, 2023 |
| Priority date | — |
| Expiry date | Oct 18, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/044
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.