Analyzing deduplicated data blocks associated with unstructured documents
US11921676B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 29, 2021 |
| Grant date | Mar 5, 2024 |
| Priority date | — |
| Expiry date | May 23, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/30
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Techniques are described relating to unstructured document processing. An associated computer-implemented method includes identifying a plurality of deduplicated data blocks associated with a collection of unstructured documents. The method further includes sorting the plurality of deduplicated data blocks in descending order based upon at least one block frequency metric, selecting a highest sorted unprocessed deduplicated data block, applying text analytics to the selected deduplicated data block, and applying at least one result of the text analytics to any document among the collection of unstructured documents including the selected deduplicated data block. The method is terminated responsive to satisfaction of at least one stopping condition.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.