Initialization of parameters for machine-learned transformer neural network architectures
US11663488B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 5, 2021 |
| Grant date | May 30, 2023 |
| Priority date | — |
| Expiry date | Aug 8, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
An online system trains a transformer architecture by an initialization method which allows the transformer architecture to be trained without normalization layers of learning rate warmup, resulting in significant improvements in computational efficiency for transformer architectures. Specifically, an attention block included in an encoder or a decoder of the transformer architecture generates the set of attention representations by applying a key matrix to the input key, a query matrix to the input query, a value matrix to the input value to generate an output, and applying an output matrix to the output to generate the set of attention representations. The initialization method may be performed by scaling the parameters of the value matrix and the output matrix with a factor that is inverse to a number of the set of encoders or a number of the set of decoders.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.