Patent · US Active

Unsupervised reinforcement learning method and apparatus based on Wasserstein distance

US11823062B1 · kind B1 · utility

0Cited by

0References

14Claims

0Family size

Assignee

TSINGHUA UNIVERSITY · CN

Inventors

Xiangyang Ji · Longbeilingcun, CN
Shuncheng He · Beijing, CN
Yuhang Jiang · Hubei, CN

Key dates

Filing date	Mar 21, 2023
Grant date	Nov 21, 2023
Priority date	—
Expiry date	Mar 21, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG06N20/00
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

The present disclosure discloses an unsupervised reinforcement learning method and apparatus based on Wasserstein distance. The method includes: obtaining a state distribution in a trajectory obtained with guidance of a current policy of an agent; calculating a Wasserstein distance between the state distribution and a state distribution in a trajectory obtained with another historical policy, and calculating a pseudo reward of the agent based on the Wasserstein distance, replacing a reward fed back from an environment in a target reinforcement learning framework with the pseudo reward, and guiding the current policy of the agent to keep a large distance from the other historical policy. The method uses Wasserstein distance to encourage an algorithm in an unsupervised reinforcement learning framework to obtain diverse policies and skills through training.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.