System and method for extracting structured data from classified websites
US8682881B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 7, 2011 |
| Grant date | Mar 25, 2014 |
| Priority date | — |
| Expiry date | Feb 12, 2032 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/958
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Systems, methods, and computer readable storage mediums are provided for automatically extracting data from a classified website. A website is determined to be a classified website based on a set of heuristics. Then page models for other classified websites are accessed. The page models may include listing page models, detail page models, and/or city page models. A listing page in the classified website is determined based on similarity of the listing page to the page models for the other classified websites. Then a listing page model for the listing page in the classified website is created. After the model has been created data from the classified website is extracted based at least in part on the listing page model. Similar processes are performed for determining a details page, creating a details page model, and extracting data from the classified website using a details page model.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.