Patent · US Active

Document body vectorization and noise-contrastive training

US11829374B2 · kind B2 · utility

0Cited by
0References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 19, 2021
Grant dateNov 28, 2023
Priority date
Expiry dateMar 19, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/00
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Document embedding vectors for each document of a corpus may be generated by combining embedding vectors for document subparts, thereby yielding a final embedding vector for the document. A machine learning model is trained using a query corpus and the document corpus, where the model generates a ranking score for a given (query, document) pair. During training, rankings scores are generated using the model, such that the training dataset is further refined using the generated ranking scores. For example, top documents and a negative document may be determined for a given query and subsequently used as training data. Multiple negative documents may therefore be determined for a given query. A negative document for a given query may be determined from the negative documents using noise-contrastive estimation. Such determined negative documents may be evaluated using a loss function during model training, thereby yielding a more robust model for search processing.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.