Modulating agent behavior to optimize learning progress
US12061964B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 25, 2020 |
| Grant date | Aug 13, 2024 |
| Priority date | — |
| Expiry date | Jun 15, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N7/01
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent. One of the methods includes sampling a behavior modulation in accordance with a current probability distribution; for each of one or more time steps: processing an input comprising an observation characterizing a current state of the environment at the time step using an action selection neural network to generate a respective action score for each action in a set of possible actions that can be performed by the agent; modifying the action scores using the sampled behavior modulation; and selecting the action to be performed by the agent at the time step based on the modified action scores; determining a fitness measure corresponding to the sampled behavior modulation; and updating the current probability distribution over the set of possible behavior modulations using the fitness measure corresponding to the behavior modulation.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.