Patent · US Active

System and methods for intrinsic reward reinforcement learning

US11521056B2 · kind B2 · utility

1Cited by

0References

32Claims

0Family size

Inventor

Graham Fyffe · Los Angeles, US

Key dates

Filing date	Jun 14, 2017
Grant date	Dec 6, 2022
Priority date	—
Expiry date	Jul 26, 2040

Classification

Technology area (CPC G)Physics
CPC primaryG06F2111/10
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A learning agent is disclosed that receives data in sequence from one or more sequential data sources; generates a model modelling sequences of data and actions; and selects an action maximizing the expected future value of a reward function, wherein the reward function depends at least partly on at least one of: a measure of the change in complexity of the model, or a measure of the complexity of the change in the model. The measure of the change in complexity of the model may be based on, for example, the change in description length of the first part of a two-part code describing one or more sequences of received data and actions, the change in description length of a statistical distribution modelling, the description length of the change in the first part of the two-part code, or the description length of the change in the statistical distribution modelling.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.