Patent · US Active

Thompson strategy based online reinforcement learning system for action selection

US7707131B2 · kind B2 · utility

29Cited by

67References

17Claims

0Family size

Assignee

Microsoft Corporation · US

Inventors

David M. Chickering · Bellevue, US
Timothy S. Paek · Mercer Island, US
Eric J. Horvitz · Kirkland, US

Key dates

Filing date	Jun 29, 2005
Grant date	Apr 27, 2010
Priority date	—
Expiry date	Apr 19, 2027

Classification

Technology area (CPC G)Physics
CPC primaryG06N20/00
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A system and method for online reinforcement learning is provided. In particular, a method for performing the explore-vs.-exploit tradeoff is provided. Although the method is heuristic, it can be applied in a principled manner while simultaneously learning the parameters and/or structure of the model (e.g., Bayesian network model).The system includes a model which receives an input (e.g., from a user) and provides a probability distribution associated with uncertainty regarding parameters of the model to a decision engine. The decision engine can determine whether to exploit the information known to it or to explore to obtain additional information based, at least in part, upon the explore-vs.-exploit tradeoff (e.g., Thompson strategy). A reinforcement learning component can obtain additional information (e.g., feedback from a user) and update parameter(s) and/or the structure of the model. The system can be employed in scenarios in which an influence diagram is used to make repeated decisions and maximization of long-term expected utility is desired.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.