Reinforcement learning for concurrent actions
US11580378B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 12, 2018 |
| Grant date | Feb 14, 2023 |
| Priority date | — |
| Expiry date | Jun 2, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/10
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A computer-implemented method comprises instantiating a policy function approximator. The policy function approximator is configured to calculate a plurality of estimated action probabilities in dependence on a given state of the environment. Each of the plurality of estimated action probabilities corresponds to a respective one of a plurality of discrete actions performable by the reinforcement learning agent within the environment. An initial plurality of estimated action probabilities in dependence on a first state of the environment are calculated. Two or more of the plurality of discrete actions are concurrently performed within the environment when the environment is in the first state. In response to the concurrent performance, a reward value is received. In response to the received reward value being greater than a baseline reward value, the policy function approximator is updated, such that it is configured to calculate an updated plurality of estimated action probabilities.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.