Distributional reinforcement learning
US10860920B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 10, 2019 |
| Grant date | Dec 8, 2020 |
| Priority date | — |
| Expiry date | Jul 10, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/084
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. A current observation characterizing a current state of the environment is received. For each action in a set of multiple actions that can be performed by the agent to interact with the environment, a probability distribution is determined over possible Q returns for the action-current observation pair. For each action, a measure of central tendency of the possible Q returns with respect to the probability distributions for the action-current observation pair is determined. An action to be performed by the agent in response to the current observation is selected using the measures of central tendency.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.