Patent · US Active

Compression method and platform of pre-training language model based on knowledge distillation

US11341326B2 · kind B2 · utility

1Cited by
2References
6Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 24, 2021
Grant dateMay 24, 2022
Priority date
Expiry dateSep 24, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N3/096
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Provided is a method and a platform for compressing a pre-training language model based on knowledge distillation. According to the method, a universal knowledge distillation strategy of feature migration is firstly designed, and in the process of knowledge distillation from the teacher model to the student model, the feature mapping of each layer of the student model is approaching the teacher's features, focusing on the ability of small samples to express features in the intermediate layer of the teacher model, and guiding the student model by using these features; then, a knowledge distillation method based on self-attention cross is constructed; finally, a linear transfer strategy based on Bernoulli probability distribution is designed to gradually complete the knowledge transfer of feature mapping and self-attention distribution from teachers to students.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.