Regularities and trends discovery in a flow of business documents
US10789281B2 · kind B2 · utility
Assignee
Inventor
Key dates
| Filing date | Jun 29, 2017 |
| Grant date | Sep 29, 2020 |
| Priority date | — |
| Expiry date | Apr 11, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/10
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for encoding documents includes building or otherwise providing a condensed dictionary including identifiers for block headers identified in text blocks extracted from a collection of training documents. For at least one test document a set of text content blocks is identified. For each of the text content blocks in the set, a block header is identified. Each block header in the training and test documents includes a sequence includes no more than a predetermined maximum number of characters. An encoding of the test document is generated, based on the identifiers of the block headers identified in the test document that are in the condensed dictionary.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.