Patent · US Active

Apparatus, method and recording medium for controlling system using temporal difference error

US11573537B2 · kind B2 · utility

0Cited by

2References

12Claims

0Family size

Assignees

Inventors

Tomotake Sasaki · Kawasaki, JP
Eiji Uchibe · Kunigami, JP
Kenji Doya · Kyoto, JP
Hirokazu Anai · Kawasaki, JP
Hitoshi Yanami · Kawasaki, JP
Hidenao Iwane · Kawasaki, JP

Key dates

Filing date	Sep 13, 2018
Grant date	Feb 7, 2023
Priority date	—
Expiry date	Dec 9, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG06N20/00
WIPO fieldControl
WIPO sectorInstruments

Abstract

A non-transitory, computer-readable recording medium stores a program of reinforcement learning by a state-value function. The program causes a computer to execute a process including calculating a temporal difference (TD) error based on an estimated state-value function, the TD error being calculated by giving a perturbation to each component of a feedback coefficient matrix that provides a policy; calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and updating the feedback coefficient matrix using the estimated gradient function matrix.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.