Reinforcement learning using obfuscated environment models
US11144847B1 · kind B1 · utility
Assignee
Inventor
Key dates
| Filing date | Apr 15, 2021 |
| Grant date | Oct 12, 2021 |
| Priority date | — |
| Expiry date | Apr 15, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/092
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection system used to select actions to be performed by an agent interacting with a target environment to perform a task in the target environment. In one aspect, a method comprises: obtaining a target environment model of the target environment; modifying the target environment model of the target environment to generate an obfuscated environment model of an obfuscated environment that represents an obfuscation of the target environment; obtaining, from each of a plurality of users, one or more obfuscated environment trajectories that represent interaction of the user with the obfuscated environment through the corresponding obfuscated environment simulation; mapping each of the obfuscated environment trajectories to a corresponding target environment trajectory; and training the action selection system on the target environment trajectories.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.