Patent · US Active

Method and apparatus for forming a structured document from unstructured information

US9280525B2 · kind B2 · utility

7Cited by
1References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 6, 2012
Grant dateMar 8, 2016
Priority date
Expiry dateJan 31, 2034

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/284
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Illustrative embodiments improve upon prior machine learning techniques by introducing an additional classification layer that mimics human visual pattern recognition. Building upon classification passes that extract contextual information, illustrative embodiments look for hints of high-level semantic categorization that manifest as visual artifacts in the document, such as font family, font weight, text color, text justification, white space, or CSS class name. An improved lightweight markup language enables display of machine-categorized tokens on a screen for human correction, thereby providing ground truths for further machine classification.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.