Patent · US Active

Specific online resource identification and extraction

US9390166B2 · kind B2 · utility

3Cited by
6References
19Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 31, 2012
Grant dateJul 12, 2016
Priority date
Expiry dateOct 13, 2033

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/951
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method of automatically identifying and extracting distributed online resources may include locating in a website a candidate entry list page. The method may also include verifying the candidate entry list page as an entry list page using repeated pattern discovery. The method may also include segmenting the entry list page into a plurality of entry items. The method may also include extracting from the plurality of entry items a plurality of candidate target pages. The method may also include verifying at least some of the candidate target pages as target pages including analyzing a visual structure and presentation of the candidate target pages. The method may also include extracting metadata from the target pages. The method may also include organizing the target pages and/or the metadata in one or more databases.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.