Specific online resource identification and extraction
US9390166B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 31, 2012 |
| Grant date | Jul 12, 2016 |
| Priority date | — |
| Expiry date | Oct 13, 2033 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/951
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method of automatically identifying and extracting distributed online resources may include locating in a website a candidate entry list page. The method may also include verifying the candidate entry list page as an entry list page using repeated pattern discovery. The method may also include segmenting the entry list page into a plurality of entry items. The method may also include extracting from the plurality of entry items a plurality of candidate target pages. The method may also include verifying at least some of the candidate target pages as target pages including analyzing a visual structure and presentation of the candidate target pages. The method may also include extracting metadata from the target pages. The method may also include organizing the target pages and/or the metadata in one or more databases.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.