Patent · US Active

Initialization of parameters for machine-learned transformer neural network architectures

US11663488B2 · kind B2 · utility

4Cited by
1References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateFeb 5, 2021
Grant dateMay 30, 2023
Priority date
Expiry dateAug 8, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/00
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

An online system trains a transformer architecture by an initialization method which allows the transformer architecture to be trained without normalization layers of learning rate warmup, resulting in significant improvements in computational efficiency for transformer architectures. Specifically, an attention block included in an encoder or a decoder of the transformer architecture generates the set of attention representations by applying a key matrix to the input key, a query matrix to the input query, a value matrix to the input value to generate an output, and applying an output matrix to the output to generate the set of attention representations. The initialization method may be performed by scaling the parameters of the value matrix and the output matrix with a factor that is inverse to a number of the set of encoders or a number of the set of decoders.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.