Configuring a system which interacts with an environment
US11402808B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 10, 2020 |
| Grant date | Aug 2, 2022 |
| Priority date | — |
| Expiry date | Feb 26, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG05B2219/39289
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system is described for configuring another system, e.g., a robotics system. The other system interacts with an environment according to a deterministic policy by repeatedly obtaining, from a sensor, sensor data indicative of a state of the environment, determining a current action, and providing, to an actuator, actuator data causing the actuator to effect the current action in the environment. To configure the other system, the system optimizes a loss function based on an accumulated reward distribution with respect to a set of parameters of the policy. The accumulated reward distribution includes an action probability of an action of a previous interaction log being performed according to the current set of parameters. The action probability is approximated using a probability distribution defined by an action selected by the deterministic policy according to the current set of parameters.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.