Compression method and platform of pre-training language model based on knowledge distillation
US11341326B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 24, 2021 |
| Grant date | May 24, 2022 |
| Priority date | — |
| Expiry date | Sep 24, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/096
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Provided is a method and a platform for compressing a pre-training language model based on knowledge distillation. According to the method, a universal knowledge distillation strategy of feature migration is firstly designed, and in the process of knowledge distillation from the teacher model to the student model, the feature mapping of each layer of the student model is approaching the teacher's features, focusing on the ability of small samples to express features in the intermediate layer of the teacher model, and guiding the student model by using these features; then, a knowledge distillation method based on self-attention cross is constructed; finally, a linear transfer strategy based on Bernoulli probability distribution is designed to gradually complete the knowledge transfer of feature mapping and self-attention distribution from teachers to students.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.