Systems and methods for extracting information from structured documents
US8090678B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 23, 2003 |
| Grant date | Jan 3, 2012 |
| Priority date | — |
| Expiry date | Jul 2, 2027 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/35
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Systems and methods for extracting information from structured documents are provided. The systems and methods relate to selecting a centroid document from a group of structured documents, selecting a subset of the group of structured documents in order to form a cluster of the subset of documents about the centroid document. The selecting the subset is preferably based on the relative similarity between each of the selected subset and the centroid document. Then, systems and methods according to the invention include marking a data element on the centroid document. The systems and elements also include identifying a data element on each of the subset of documents, the data element that corresponds to the marked data element on the centroid document. Finally, data may be extracted from the subset of documents based on the identifying step.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.