Patent · US Active

Modulating agent behavior to optimize learning progress

US12061964B2 · kind B2 · utility

1Cited by

2References

20Claims

0Family size

Assignee

DeepMind Technologies Limited · GB

Inventors

Tom Schaul · London, GB
Diana Luiza Borsa · London, GB
Fengning Ding · London, GB
David Szepesvari · London, GB
Georg Ostrovski · London, GB
Simon Osindero · London, GB
William Clinton Dabney · London, GB

Key dates

Filing date	Sep 25, 2020
Grant date	Aug 13, 2024
Priority date	—
Expiry date	Jun 15, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG06N7/01
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent. One of the methods includes sampling a behavior modulation in accordance with a current probability distribution; for each of one or more time steps: processing an input comprising an observation characterizing a current state of the environment at the time step using an action selection neural network to generate a respective action score for each action in a set of possible actions that can be performed by the agent; modifying the action scores using the sampled behavior modulation; and selecting the action to be performed by the agent at the time step based on the modified action scores; determining a fitness measure corresponding to the sampled behavior modulation; and updating the current probability distribution over the set of possible behavior modulations using the fitness measure corresponding to the behavior modulation.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.