Patent · US Active

Control policies for collective robot learning

US11188821B1 · kind B1 · utility

27Cited by

9References

20Claims

0Family size

Assignee

X Development LLC · US

Inventors

Mrinal Kalakrishnan · Palo Alto, US
Ali Hamid Yahya Valdovinos · Palo Alto, US
Adrian Li · San Francisco, US
Yevgen Chebotar · Los Angeles, US
Sergey Levine · Redmond, US

Key dates

Filing date	Sep 15, 2017
Grant date	Nov 30, 2021
Priority date	—
Expiry date	Jul 1, 2040

Classification

Technology area (CPC G)Physics
CPC primaryG05B2219/39298
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, of training a global policy neural network. One of the methods includes initializing an instance of the robotic task for multiple local workers, generating a trajectory of state-action pairs by selecting actions to be performed by the robotic agent while performing the instance of the robotic task, optimizing a local policy controller on the trajectory, generating an optimized trajectory using the optimized local controller, and storing the optimized trajectory in a replay memory associated with the local worker. The method includes sampling, for multiple global workers, an optimized trajectory from one of one or more replay memories associated with the global worker, and training the replica of the global policy neural network maintained by the global worker on the sampled optimized trajectory to determine delta values for the parameters of the global policy neural network.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.