Patent · US Active

Systems and methods for extracting information from structured documents

US8090678B1 · kind B1 · utility

9Cited by
22References
27Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 23, 2003
Grant dateJan 3, 2012
Priority date
Expiry dateJul 2, 2027

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/35
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Systems and methods for extracting information from structured documents are provided. The systems and methods relate to selecting a centroid document from a group of structured documents, selecting a subset of the group of structured documents in order to form a cluster of the subset of documents about the centroid document. The selecting the subset is preferably based on the relative similarity between each of the selected subset and the centroid document. Then, systems and methods according to the invention include marking a data element on the centroid document. The systems and elements also include identifying a data element on each of the subset of documents, the data element that corresponds to the marked data element on the centroid document. Finally, data may be extracted from the subset of documents based on the identifying step.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.