Continual reinforcement learning with a multi-task agent
US12154029B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 5, 2019 |
| Grant date | Nov 26, 2024 |
| Priority date | — |
| Expiry date | May 22, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/098
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method of training an action selection neural network for controlling an agent interacting with an environment to perform different tasks is described. The method includes obtaining a first trajectory of transitions generated while the agent was performing an episode of the first task from multiple tasks; and training the action selection neural network on the first trajectory to adjust the control policies for the multiple tasks. The training includes, for each transition in the first trajectory: generating respective policy outputs for the initial observation in the transition for each task in a subset of tasks that includes the first task and one other task; generating respective target policy outputs for each task using the reward in the transition, and determining an update to the current parameter values based on, for each task, a gradient of a loss between the policy output and the target policy output for the task.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.