Patent · US Revoked

Training action selection neural networks using apprenticeship

US11468321B2 · kind B2 · utility

0Cited by

4References

20Claims

0Family size

Assignee

DeepMind Technologies Limited · GB

Inventors

Olivier Claude Pietquin · Lille, FR
Martin Riedmiller · Balgheim, DE
Wang Fumin · London, GB
Bilal Piot · London, GB
Mel Vecerik · London, GB
Todd Hester · London, GB
Thomas Rothoerl · London, GB
Thomas Lampe · London, GB
Nicolas Manfred Otto Heess · London, GB
Jonathan Karl Scholz · London, GB

Key dates

Filing date	Jun 28, 2018
Grant date	Oct 11, 2022
Priority date	—
Expiry date	Jul 1, 2039

Classification

Technology area (CPC —)General

Abstract

An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.