Patent · US Active

Selecting actions from large discrete action sets using reinforcement learning

US10885432B1 · kind B1 · utility

12Cited by

0References

18Claims

0Family size

Assignee

DeepMind Technologies Limited · GB

Inventors

Gabriel Dulac-Arnold · Sautereau, FR
Richard Andrew Evans · London, GB
Benjamin Kenneth Coppin · Cottenham, GB

Key dates

Filing date	Dec 16, 2016
Grant date	Jan 5, 2021
Priority date	—
Expiry date	Nov 6, 2039

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/092
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting actions from large discrete action sets. One of the methods includes receiving a particular observation representing a particular state of an environment; and selecting an action from a discrete set of actions to be performed by an agent interacting with the environment, comprising: processing the particular observation using an actor policy network to generate an ideal point; determining, from the points that represent actions in the set, the k nearest points to the ideal point; for each nearest point of the k nearest points: processing the nearest point and the particular observation using a Q network to generate a respective Q value for the action represented by the nearest point; and selecting the action to be performed by the agent from the k actions represented by the k nearest points based on the Q values.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.