Patent · US Active

System and method for extracting structured data from classified websites

US8682881B1 · kind B1 · utility

3Cited by
14References
24Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 7, 2011
Grant dateMar 25, 2014
Priority date
Expiry dateFeb 12, 2032

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/958
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Systems, methods, and computer readable storage mediums are provided for automatically extracting data from a classified website. A website is determined to be a classified website based on a set of heuristics. Then page models for other classified websites are accessed. The page models may include listing page models, detail page models, and/or city page models. A listing page in the classified website is determined based on similarity of the listing page to the page models for the other classified websites. Then a listing page model for the listing page in the classified website is created. After the model has been created data from the classified website is extracted based at least in part on the listing page model. Similar processes are performed for determining a details page, creating a details page model, and extracting data from the classified website using a details page model.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.