Extracting information from formatted sources
US7630968B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 16, 2006 |
| Grant date | Dec 8, 2009 |
| Priority date | — |
| Expiry date | Dec 20, 2026 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99937
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
An extraction manager extracts information from formatted input. The input is annotated with presentation information, and parsed into a set of elements comprising a canonical representation thereof. An information analyzer analyzes the elements in order to glean additional information. An entity extractor determines entities to extract from the input. The entity extractor analyzes elements according to specific entities to be extracted, and creates entity specific observations for analyzed elements. These observations comprise possible values for the relevant entities. A heuristics processor maintains a collection of entity specific heuristics, each comprising a test to help determine the suitability of data as a value for the corresponding entity. The heuristics processor selects heuristics for the entities to be extracted, and tests observations for these entities against the selected heuristics. Responsive to this testing, ordered possible values for entities to extract are determined.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.