Patent · US Active

Document crawling systems and methods

US8285703B1 · kind B1 · utility

9Cited by
7References
31Claims
0Family size

Assignee

Inventor

Key dates

Filing dateMay 25, 2010
Grant dateOct 9, 2012
Priority date
Expiry dateSep 7, 2030

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG16H30/20
  • WIPO fieldMedical technology
  • WIPO sectorInstruments

Abstract

Systems and methods are provided for crawling and indexing documents stored in a data storage system. A crawler system processes multiple jobs that each correspond to crawling documents in the data storage system. Each job includes priority data and crawling instructions. The crawler system stores each job in a priority queue in a sequence based on the priority data. The crawler system assigns each job in the priority queue to a next available processing module for processing based on the stored sequence. Before processing each job, the crawler system determines whether to segment the job into smaller steps based on the corresponding crawling instructions. If the job is segmented, one of smaller steps is processed to crawl a group of the documents in the data storage system. The remaining steps are stored in the priority queue to wait for processing.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.