Document crawling systems and methods
US8285703B1 · kind B1 · utility
Assignee
Inventor
Key dates
| Filing date | May 25, 2010 |
| Grant date | Oct 9, 2012 |
| Priority date | — |
| Expiry date | Sep 7, 2030 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG16H30/20
- WIPO fieldMedical technology
- WIPO sectorInstruments
Abstract
Systems and methods are provided for crawling and indexing documents stored in a data storage system. A crawler system processes multiple jobs that each correspond to crawling documents in the data storage system. Each job includes priority data and crawling instructions. The crawler system stores each job in a priority queue in a sequence based on the priority data. The crawler system assigns each job in the priority queue to a next available processing module for processing based on the stored sequence. Before processing each job, the crawler system determines whether to segment the job into smaller steps based on the corresponding crawling instructions. If the job is segmented, one of smaller steps is processed to crawl a group of the documents in the data storage system. The remaining steps are stored in the priority queue to wait for processing.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.