Patent · US Active

Systems and methods for generating vector space embeddings from a multi-format document

US11727062B1 · kind B1 · utility

1Cited by
1References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 16, 2021
Grant dateAug 15, 2023
Priority date
Expiry dateJun 16, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/289
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Embodiments described herein provide a mechanism that encodes a text document into a geometric graph, where the nodes of the graph represent bits of text from the document and the edges of the graph represent relationships among the bits of text as laid out on a page of the document. Each node of the graph is encoded into a vector representation that contains information of the node and the local sub-graph (including the node and all edges branching out from the node). In this way, the vector representations of the document contain information of the inner-relationship between words, sentences and paragraphs of the document, instead of just mapping the text in the document as a string of input tokens to a vector representation.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.