Patent · US Active

Selecting actions to be performed by a reinforcement learning agent using tree search

US10867242B2 · kind B2 · utility

3Cited by

2References

21Claims

0Family size

Assignee

DeepMind Technologies Limited · GB

Inventors

Thore Graepel · Cambridge, GB
Shih-Chieh Huang · Taichung, TW
David Silver · Hitchin, GB
Arthur Clement Guez · London, GB
Laurent Sifre · Paris, FR
Ilya Sutskever · San Francisco, US
Christopher Maddison · Toronto, CA

Key dates

Filing date	Sep 29, 2016
Grant date	Dec 15, 2020
Priority date	—
Expiry date	Jun 21, 2038

Classification

Technology area (CPC G)Physics
CPC primaryG16H50/20
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods, systems and apparatus, including computer programs encoded on computer storage media, for training a value neural network that is configured to receive an observation characterizing a state of an environment being interacted with by an agent and to process the observation in accordance with parameters of the value neural network to generate a value score. One of the systems performs operations that include training a supervised learning policy neural network; initializing initial values of parameters of a reinforcement learning policy neural network having a same architecture as the supervised learning policy network to the trained values of the parameters of the supervised learning policy neural network; training the reinforcement learning policy neural network on second training data; and training the value neural network to generate a value score for the state of the environment that represents a predicted long-term reward resulting from the environment being in the state.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.