Patent · US Active

Distantly supervised wrapper induction for semi-structured documents

US10977573B1 · kind B1 · utility

6Cited by
4References
21Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 15, 2016
Grant dateApr 13, 2021
Priority date
Expiry dateOct 8, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N5/025
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Systems and methods provide distantly supervised wrapper induction for semi-structured documents, including automatically generating and annotating training documents for the wrapper. Training of the wrapper may occur in two phases using the training documents. An example method includes identifying a training set of semi-structured web pages having a subject entity that exists in a knowledge base and, for each training page, identifying target objects, identifying predicates in the knowledge base that connect the subject entity to a target objects identified in the training page, and annotating the training page. Annotating a training page includes generating a feature set for a mention of the target object, generating predicate-target object pairs for the mention, and labeling each predicate-target object pair with a corresponding example type and weight. The annotated training pages are used to train the wrapper to extract new subject entities and new facts from the set of semi-structured web pages.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.