Systems and methods for generating vector space embeddings from a multi-format document
US11727062B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 16, 2021 |
| Grant date | Aug 15, 2023 |
| Priority date | — |
| Expiry date | Jun 16, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/289
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Embodiments described herein provide a mechanism that encodes a text document into a geometric graph, where the nodes of the graph represent bits of text from the document and the edges of the graph represent relationships among the bits of text as laid out on a page of the document. Each node of the graph is encoded into a vector representation that contains information of the node and the local sub-graph (including the node and all edges branching out from the node). In this way, the vector representations of the document contain information of the inner-relationship between words, sentences and paragraphs of the document, instead of just mapping the text in the document as a string of input tokens to a vector representation.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.