System and method for entity extraction from semi-structured text documents
US10489439B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 14, 2016 |
| Grant date | Nov 26, 2019 |
| Priority date | — |
| Expiry date | Jun 4, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for extracting entities from a text document includes, for at least a section of a text document, providing a first set of entities extracted from the at least a section, clustering at least a subset of the extracted entities in the first set into clusters, based on locations of the entities in the document. Complete ones of the clusters of entities are identified. Patterns for extracting new entities are learned based on the complete clusters. New entities are extracted from incomplete clusters based on the learned patterns.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.