Patent · US Active

Scalable indexing for layout based document retrieval and ranking

US7953679B2 · kind B2 · utility

306Cited by
14References
25Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 9, 2009
Grant dateMay 31, 2011
Priority date
Expiry dateDec 8, 2029

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/414
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A computer-based method and a system for indexing, querying, and ranking documents based on layout are provided. The method includes providing a plurality of documents to computer memory, extracting layout blocks from the provided documents, clustering the layout blocks into a plurality of layout block clusters, computing a representative block for each of the layout block clusters, generating a document index for each provided document based on the layout blocks of the document and the computed representatives blocks, clustering the created document indexes into a plurality of document index clusters, and generating a representative cluster index for each of the document index clusters. The indexes generated, together with the representative blocks and document index clusters, can be stored and used for retrieval of documents responsive to a layout query.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.