Patent · US Active

Unsupervised reinforcement learning method and apparatus based on Wasserstein distance

US11823062B1 · kind B1 · utility

0Cited by
0References
14Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 21, 2023
Grant dateNov 21, 2023
Priority date
Expiry dateMar 21, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/00
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The present disclosure discloses an unsupervised reinforcement learning method and apparatus based on Wasserstein distance. The method includes: obtaining a state distribution in a trajectory obtained with guidance of a current policy of an agent; calculating a Wasserstein distance between the state distribution and a state distribution in a trajectory obtained with another historical policy, and calculating a pseudo reward of the agent based on the Wasserstein distance, replacing a reward fed back from an environment in a target reinforcement learning framework with the pseudo reward, and guiding the current policy of the agent to keep a large distance from the other historical policy. The method uses Wasserstein distance to encourage an algorithm in an unsupervised reinforcement learning framework to obtain diverse policies and skills through training.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.