Language-model pretraining with gradient-disentangled embedding sharing
US12223269B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | May 18, 2022 |
| Grant date | Feb 11, 2025 |
| Priority date | — |
| Expiry date | Jan 13, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N5/04
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for training a language model comprises (a) receiving vectorized training data as input to a multitask pretraining problem; (b) generating modified vectorized training data based on the vectorized training data, according to an upstream data embedding; (c) emitting pretraining output based on the modified vectorized training data, according to a downstream data embedding equivalent to the upstream data embedding; and (d) adjusting the upstream data embedding and the downstream data embedding by computing, based on the pretraining output, a gradient of the upstream data embedding disentangled from a gradient of the downstream data embedding, thereby advancing the multitask pretraining problem toward a pretrained state.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.