Patent · US Active

Transformer-based neural network including a mask attention network

US12260338B2 · kind B2 · utility

0Cited by

0References

20Claims

0Family size

Assignee

MICROSOFT TECHNOLOGY LICENSING, LLC · US

Inventors

Jian Jiao · Hinsdale, US
Yeyun GONG · Beijing, CN
Nan Duan · Beijing, CN
Ruofei Zhang · Mountain View, US
Ming Zhou · Beijing, CN

Key dates

Filing date	Aug 27, 2020
Grant date	Mar 25, 2025
Priority date	—
Expiry date	Jun 9, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/09
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A transformer-based neural network includes at least one mask attention network (MAN). The MAN computes an original attention data structure that expresses influence between pairs of data items in a sequence of data items. The MAN then modifies the original data structure by mask values in a mask data structure, to produce a modified attention data structure. Compared to the original attention data structure, the modified attention data structure better accounts for the influence of neighboring data items in the sequence of data items, given a particular data item under consideration. The mask data structure used by the MAN can have static and/or machine-trained mask values. In one implementation, the transformer-based neural network includes at least one MAN in combination with at least one other attention network that does not use a mask data structure, and at least one feed-forward neural network.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.