Unsupervised reinforcement learning method and apparatus based on Wasserstein distance
US11823062B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 21, 2023 |
| Grant date | Nov 21, 2023 |
| Priority date | — |
| Expiry date | Mar 21, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
The present disclosure discloses an unsupervised reinforcement learning method and apparatus based on Wasserstein distance. The method includes: obtaining a state distribution in a trajectory obtained with guidance of a current policy of an agent; calculating a Wasserstein distance between the state distribution and a state distribution in a trajectory obtained with another historical policy, and calculating a pseudo reward of the agent based on the Wasserstein distance, replacing a reward fed back from an environment in a target reinforcement learning framework with the pseudo reward, and guiding the current policy of the agent to keep a large distance from the other historical policy. The method uses Wasserstein distance to encourage an algorithm in an unsupervised reinforcement learning framework to obtain diverse policies and skills through training.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.