Embeddings with classes
US11373042B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 3, 2019 |
| Grant date | Jun 28, 2022 |
| Priority date | — |
| Expiry date | Dec 24, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06T1/20
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Described herein are systems and methods for word embeddings to avoid the need to throw out rare words appearing less than a certain number of times in a corpus. Embodiments of the present disclosure involve group words into clusters/classes for multiple times using different assignments of the vocabulary words to a number of classes. Multiple copies of the training corpus are then generated using the assignments to replace each word with the appropriate class. A word embedding generating model is run on the multiple class corpora to generate multiple class embeddings. An estimate of the gold word embedding matrix is then reconstructed from multiple pairs of assignments, class embeddings, and covariances. Test results show the effectiveness of embodiments of the present disclosure.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.