Patent · US Active

Initialization of parameters for machine-learned transformer neural network architectures

US11663488B2 · kind B2 · utility

4Cited by

1References

20Claims

0Family size

Assignee

The Toronto-Dominion Bank · CA

Inventors

Maksims Volkovs · Toronto, CA
Xiao Shi Huang · Toronto, CA
Juan Felipe Perez Vallejo · Toronto, CA

Key dates

Filing date	Feb 5, 2021
Grant date	May 30, 2023
Priority date	—
Expiry date	Aug 8, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG06N20/00
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

An online system trains a transformer architecture by an initialization method which allows the transformer architecture to be trained without normalization layers of learning rate warmup, resulting in significant improvements in computational efficiency for transformer architectures. Specifically, an attention block included in an encoder or a decoder of the transformer architecture generates the set of attention representations by applying a key matrix to the input key, a query matrix to the input query, a value matrix to the input value to generate an output, and applying an output matrix to the output to generate the set of attention representations. The initialization method may be performed by scaling the parameters of the value matrix and the output matrix with a factor that is inverse to a number of the set of encoders or a number of the set of decoders.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.