Patent · US Active

Modulating agent behavior to optimize learning progress

US12061964B2 · kind B2 · utility

1Cited by
2References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 25, 2020
Grant dateAug 13, 2024
Priority date
Expiry dateJun 15, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N7/01
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent. One of the methods includes sampling a behavior modulation in accordance with a current probability distribution; for each of one or more time steps: processing an input comprising an observation characterizing a current state of the environment at the time step using an action selection neural network to generate a respective action score for each action in a set of possible actions that can be performed by the agent; modifying the action scores using the sampled behavior modulation; and selecting the action to be performed by the agent at the time step based on the modified action scores; determining a fitness measure corresponding to the sampled behavior modulation; and updating the current probability distribution over the set of possible behavior modulations using the fitness measure corresponding to the behavior modulation.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.