Patent · US Active

Selecting actions from large discrete action sets using reinforcement learning

US10885432B1 · kind B1 · utility

12Cited by
0References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 16, 2016
Grant dateJan 5, 2021
Priority date
Expiry dateNov 6, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N3/092
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting actions from large discrete action sets. One of the methods includes receiving a particular observation representing a particular state of an environment; and selecting an action from a discrete set of actions to be performed by an agent interacting with the environment, comprising: processing the particular observation using an actor policy network to generate an ideal point; determining, from the points that represent actions in the set, the k nearest points to the ideal point; for each nearest point of the k nearest points: processing the nearest point and the particular observation using a Q network to generate a respective Q value for the action represented by the nearest point; and selecting the action to be performed by the agent from the k actions represented by the k nearest points based on the Q values.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.