Patent · US Active

System and methods for intrinsic reward reinforcement learning

US11521056B2 · kind B2 · utility

1Cited by
0References
32Claims
0Family size

Inventor

Key dates

Filing dateJun 14, 2017
Grant dateDec 6, 2022
Priority date
Expiry dateJul 26, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F2111/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A learning agent is disclosed that receives data in sequence from one or more sequential data sources; generates a model modelling sequences of data and actions; and selects an action maximizing the expected future value of a reward function, wherein the reward function depends at least partly on at least one of: a measure of the change in complexity of the model, or a measure of the complexity of the change in the model. The measure of the change in complexity of the model may be based on, for example, the change in description length of the first part of a two-part code describing one or more sequences of received data and actions, the change in description length of a statistical distribution modelling, the description length of the change in the first part of the two-part code, or the description length of the change in the statistical distribution modelling.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.