Patent · US Active

System and method for automated domain-extensible web scraping

US10423675B2 · kind B2 · utility

5Cited by
3References
14Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 30, 2016
Grant dateSep 24, 2019
Priority date
Expiry dateApr 13, 2038

Classification

  • Technology area (CPC H)Electricity
  • CPC primaryH04L67/02
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

An automated extensible scraping script is generated for web scraping that is extensible to a plurality of domains. Web sites are classified based on common extracted domain data, further clustering the data based on common navigation structures, and using such commonalities to automate the generation of scraping code based on predefined and reusable code snippets for specific parts of the web sites. Scraping services include a mapper module and a script generator module. Building blocks include a data model updater, a navigation model generator and a navigation model matcher. An administrative module includes domain clustering and configuration file maintenance.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.