Patent · US Active

Distributional reinforcement learning

US10860920B2 · kind B2 · utility

6Cited by
0References
22Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 10, 2019
Grant dateDec 8, 2020
Priority date
Expiry dateJul 10, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N3/084
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. A current observation characterizing a current state of the environment is received. For each action in a set of multiple actions that can be performed by the agent to interact with the environment, a probability distribution is determined over possible Q returns for the action-current observation pair. For each action, a measure of central tendency of the possible Q returns with respect to the probability distributions for the action-current observation pair is determined. An action to be performed by the agent in response to the current observation is selected using the measures of central tendency.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.