Patent · US Active

Classifying structured documents

US9477756B1 · kind B1 · utility

12Cited by
4References
21Claims
0Family size

Assignee

Inventor

Key dates

Filing dateJan 16, 2012
Grant dateOct 25, 2016
Priority date
Expiry dateJan 16, 2032

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/221
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Technologies are described herein for classifying structured documents based on the structure of the document. A structured document is received, and the structural elements are parsed from the document to generate a text string representing the structure of the document instead of the semantic textual content of the document. The text string may be broken into N-grams utilizing a sliding window, and a classifier trained from similar structured documents labeled as belonging to one of a number of document classes is utilized to determine a probability that the document belongs to each of the document classes based on the N-grams.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.