Action selection by reinforcement learning and numerical optimization
US11551165B1 · kind B1 · utility
Assignee
Inventor
Key dates
| Filing date | Apr 5, 2022 |
| Grant date | Jan 10, 2023 |
| Priority date | — |
| Expiry date | Apr 5, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/092
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent interacting with an environment. In one aspect, a method comprises, at each of one or more time steps: generating a respective action score for each action in a set of possible actions, wherein the set of possible actions comprises: (i) a plurality of atomistic actions, and (ii) one or more optimization actions, wherein each optimization action is associated with a respective objective function that measures performance of the agent on a corresponding auxiliary task; selecting an action from the set of possible actions in accordance with the action scores, wherein the selected action is an optimization action; in response to selecting the optimization action, performing a numerical optimization to identify a sequence of one or more atomistic actions that are predicted to optimize the objective function.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.