Patent · US Active

Systems and methods for using contrastive pre-training to generate text and code embeddings

US12073299B2 · kind B2 · utility

0Cited by
1References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 23, 2023
Grant dateAug 27, 2024
Priority date
Expiry dateJan 23, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N3/088
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Embodiments of the present disclosure may include systems, methods, and computer readable media for generating a vector representation, including receiving a training data set, the training data set including a plurality of paired data samples corresponding to positive example pairs, each positive example pair including a first data unit and a second data unit. Embodiments may also include converting the training data set into at least one first vector of a vector representation. Embodiments may further include accessing one or more negative example pairs to contrast against the positive example pairs. Embodiments may also include converting the one or more negative example pairs into one or more second vectors of the vector representation. Embodiments may further include training an artificial machine learning model to generate additional vectors of the vector representation. Further embodiments may include systems, methods, and media for determining semantic similarity based on one or more vector representations.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.