Patent · US Active

Reinforcement learning for concurrent actions

US11580378B2 · kind B2 · utility

1Cited by
0References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 12, 2018
Grant dateFeb 14, 2023
Priority date
Expiry dateJun 2, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A computer-implemented method comprises instantiating a policy function approximator. The policy function approximator is configured to calculate a plurality of estimated action probabilities in dependence on a given state of the environment. Each of the plurality of estimated action probabilities corresponds to a respective one of a plurality of discrete actions performable by the reinforcement learning agent within the environment. An initial plurality of estimated action probabilities in dependence on a first state of the environment are calculated. Two or more of the plurality of discrete actions are concurrently performed within the environment when the environment is in the first state. In response to the concurrent performance, a reward value is received. In response to the received reward value being greater than a baseline reward value, the policy function approximator is updated, such that it is configured to calculate an updated plurality of estimated action probabilities.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.