Patent · US Active

Distributed training using actor-critic reinforcement learning with off-policy correction factors

US11593646B2 · kind B2 · utility

1Cited by

2References

23Claims

0Family size

Assignee

DeepMind Technologies Limited · GB

Inventors

Hubert Josef Soyer · London, GB
Lasse Espeholt · Amsterdam, NL
Karen Simonyan · London, GB
Yotam Doron · London, GB
Vlad Firoiu · London, GB
Volodymyr Mnih · Toronto, CA
Koray Kavukcuoglu · London, GB
Remi Munos · London, GB
Thomas Ward · Boston, US
Timothy James Alexander Harley · London, GB
Iain Robert Dunning · New York, US

Key dates

Filing date	Feb 5, 2019
Grant date	Feb 28, 2023
Priority date	—
Expiry date	Oct 18, 2039

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/044
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.