Patent · US Expired

Document processing method, system and medium

US7046847B2 · kind B2 · utility

3Cited by
9References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 25, 2001
Grant dateMay 16, 2006
Priority date
Expiry dateJun 2, 2024

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/131
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A technique for extracting a meaningful text block from a document where a table, an itemized list, a multiple column, etc., are arbitrarily laid out. A document is input which is laid out using blanks or the like, then a symbol is acquired which is associated with a spatial coordinate of the document. Consecutive characters of the same type are extracted from the symbol to generate a token and a space. A stream is generated from consecutive spaces in the column direction, while a text block is generated from streams and tokens. A link is generated between the text blocks to form a document graph. Validity of a connection (link) between the text blocks in the document graph is evaluated using a language model, then the text blocks are merged if the connection is valid.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.