Patent · US Active

Analyzing deduplicated data blocks associated with unstructured documents

US11921676B2 · kind B2 · utility

0Cited by
1References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 29, 2021
Grant dateMar 5, 2024
Priority date
Expiry dateMay 23, 2042

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/30
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Techniques are described relating to unstructured document processing. An associated computer-implemented method includes identifying a plurality of deduplicated data blocks associated with a collection of unstructured documents. The method further includes sorting the plurality of deduplicated data blocks in descending order based upon at least one block frequency metric, selecting a highest sorted unprocessed deduplicated data block, applying text analytics to the selected deduplicated data block, and applying at least one result of the text analytics to any document among the collection of unstructured documents including the selected deduplicated data block. The method is terminated responsive to satisfaction of at least one stopping condition.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.