Selecting actions from large discrete action sets using reinforcement learning
US10885432B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 16, 2016 |
| Grant date | Jan 5, 2021 |
| Priority date | — |
| Expiry date | Nov 6, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/092
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting actions from large discrete action sets. One of the methods includes receiving a particular observation representing a particular state of an environment; and selecting an action from a discrete set of actions to be performed by an agent interacting with the environment, comprising: processing the particular observation using an actor policy network to generate an ideal point; determining, from the points that represent actions in the set, the k nearest points to the ideal point; for each nearest point of the k nearest points: processing the nearest point and the particular observation using a Q network to generate a respective Q value for the action represented by the nearest point; and selecting the action to be performed by the agent from the k actions represented by the k nearest points based on the Q values.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.