Patent · US Active

Web knowledge extraction for search task simplification

US9020947B2 · kind B2 · utility

4Cited by
1References
19Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 30, 2011
Grant dateApr 28, 2015
Priority date
Expiry dateDec 11, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/9535
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Techniques are described for generating structured information from semi-structured web pages, and retrieving the structured knowledge in response to a user query that indicates a query intent. The structured information is automatically extracted offline from semi-structured web pages, through the use of an auto wrapper solution that is noise tolerant, scalable, and automatic. The structured information is stored in a knowledge base, and provided in response to a user search query that indicates a query intent. Extraction of structured information may also include clustering of pages based on their measured similarities. The clusters may be determined based on similar elements in the tag path text data of the pages. A minimum size threshold may be applied to the clusters.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.