Systems and methods for using contrastive pre-training to generate text and code embeddings
US12073299B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 23, 2023 |
| Grant date | Aug 27, 2024 |
| Priority date | — |
| Expiry date | Jan 23, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/088
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Embodiments of the present disclosure may include systems, methods, and computer readable media for generating a vector representation, including receiving a training data set, the training data set including a plurality of paired data samples corresponding to positive example pairs, each positive example pair including a first data unit and a second data unit. Embodiments may also include converting the training data set into at least one first vector of a vector representation. Embodiments may further include accessing one or more negative example pairs to contrast against the positive example pairs. Embodiments may also include converting the one or more negative example pairs into one or more second vectors of the vector representation. Embodiments may further include training an artificial machine learning model to generate additional vectors of the vector representation. Further embodiments may include systems, methods, and media for determining semantic similarity based on one or more vector representations.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.