Patent · US Active

Digital organization of printed documents according to extracted semantic information

US10769503B1 · kind B1 · utility

21Cited by
0References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 25, 2019
Grant dateSep 8, 2020
Priority date
Expiry dateApr 25, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/2528
  • WIPO fieldAudio-visual technology
  • WIPO sectorElectrical engineering

Abstract

A method of analyzing and organizing printed documents is performed at a computing system having one or more processors and memory. The method includes receiving one or more printed documents, each including one or more pages. The method includes processing each page of each printed document. The method includes scanning the respective page to obtain an image file. The method also includes determining a document class for the respective page by inputting the image file to one or more trained classifier models, and generating a semantic analyzer pipeline including at least an optical character recognition (OCR)-based semantic analyzer. The method also includes applying the OCR-based semantic analyzer to the preprocessed output page to generate a preprocessed output page and to extract semantic information corresponding to the respective page. The method includes determining a digital organization for the respective printed document based on the extracted semantic information and the document class.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.