Partially observed Markov decision process model and its use
US11176473B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 6, 2017 |
| Grant date | Nov 16, 2021 |
| Priority date | — |
| Expiry date | Feb 6, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for selecting an action, includes reading, into a memory, a Partially Observed Markov Decision Process (POMDP) model, the POMDP model having top-k action IDs for each belief state, the top-k action IDs maximizing expected long-term cumulative rewards in each time-step, and k being an integer of two or more, in the execution-time process of the POMDP model, detecting a situation where an action identified by the best action ID among the top-k action IDs for a current belief state is unable to be selected due to a constraint, and selecting and executing an action identified by the second best action ID among the top-k action IDs for the current belief state in response to a detection of the situation. The top-k action IDs may be top-k alpha vectors, each of the top-k alpha vectors having an associated action, or identifiers of top-k actions associated with alpha vectors.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.