Patent · US Active

Compression method and platform of pre-training language model based on knowledge distillation

US11341326B2 · kind B2 · utility

1Cited by

2References

6Claims

0Family size

Assignee

ZHEJIANG LAB · CN

Inventors

Hongsheng Wang · Elmhurst, US
Haijun Shan · Hangzhou City, CN
Fei Yang · Fremont, US

Key dates

Filing date	Sep 24, 2021
Grant date	May 24, 2022
Priority date	—
Expiry date	Sep 24, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/096
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Provided is a method and a platform for compressing a pre-training language model based on knowledge distillation. According to the method, a universal knowledge distillation strategy of feature migration is firstly designed, and in the process of knowledge distillation from the teacher model to the student model, the feature mapping of each layer of the student model is approaching the teacher's features, focusing on the ability of small samples to express features in the intermediate layer of the teacher model, and guiding the student model by using these features; then, a knowledge distillation method based on self-attention cross is constructed; finally, a linear transfer strategy based on Bernoulli probability distribution is designed to gradually complete the knowledge transfer of feature mapping and self-attention distribution from teachers to students.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.