Patent · US Active

Language-model pretraining with gradient-disentangled embedding sharing

US12223269B2 · kind B2 · utility

0Cited by
0References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 18, 2022
Grant dateFeb 11, 2025
Priority date
Expiry dateJan 13, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N5/04
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for training a language model comprises (a) receiving vectorized training data as input to a multitask pretraining problem; (b) generating modified vectorized training data based on the vectorized training data, according to an upstream data embedding; (c) emitting pretraining output based on the modified vectorized training data, according to a downstream data embedding equivalent to the upstream data embedding; and (d) adjusting the upstream data embedding and the downstream data embedding by computing, based on the pretraining output, a gradient of the upstream data embedding disentangled from a gradient of the downstream data embedding, thereby advancing the multitask pretraining problem toward a pretrained state.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.