Patent · US Active

Distributional reinforcement learning

US10860920B2 · kind B2 · utility

6Cited by

0References

22Claims

0Family size

Assignee

DeepMind Technologies Limited · GB

Inventors

Marc Gendron-Bellemare · London, GB
William Clinton Dabney · London, GB

Key dates

Filing date	Jul 10, 2019
Grant date	Dec 8, 2020
Priority date	—
Expiry date	Jul 10, 2039

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/084
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent interacting with an environment. A current observation characterizing a current state of the environment is received. For each action in a set of multiple actions that can be performed by the agent to interact with the environment, a probability distribution is determined over possible Q returns for the action-current observation pair. For each action, a measure of central tendency of the possible Q returns with respect to the probability distributions for the action-current observation pair is determined. An action to be performed by the agent in response to the current observation is selected using the measures of central tendency.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.