Patent · US Active

Online machine learning with immediate rewards when real rewards are delayed

US12056584B2 · kind B2 · utility

0Cited by

9References

24Claims

0Family size

Assignee

International Business Machines Corporation · US

Inventors

Oznur Alkan · Clonsilla, IE
Djallel Bouneffouf · Wappingers Falls, US
Bei Chen · Dublin, IE
Elizabeth Daly · Dublin, IE

Key dates

Filing date	Nov 16, 2020
Grant date	Aug 6, 2024
Priority date	—
Expiry date	Jun 8, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/08
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

An online machine learning model such as an autonomous agent predicts an action. A processor associated with, or running, the online machine learning model observes an environment for an interval of time for a real reward associated with the action. Responsive to determining that the real reward is not received within the interval of time, the processor determines based on a criterion whether to allocate an immediate reward received within the interval of time to the online machine learning model, where the immediate reward is an approximation of the real reward. Responsive to determining that the immediate reward is to be allocated, the processor allocates the immediate reward to the online machine learning model. The online machine learning model further learns or retrains itself based on the immediate reward.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.