Patent · US Expired

Method and apparatus for computer understanding and manipulation of minimally formatted text documents

US5164899A · kind A · utility

58Cited by
6References
12Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 1, 1989
Grant dateNov 17, 1992
Priority date
Expiry dateMay 1, 2009

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/253
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and apparatus which enables a computer to understand and manipulate minimally formatted text of such documents as resumes, purchase order forms, insurance forms, bank statements and similar items is disclosed. The documents are digitized by an optical scanner and translated into ASCII text by an optical character reader. The invention manipulates the digital image of the document to find blocks of contiguous text. After separating the text by block, each block is converted into an ASCII character file. Next, these files are processed by a Grammar, which uses pattern matching techniques and syntax rules to enable the host computer to understand the text. After further manipulation by the invention, the text is either stored or outputted in a form which greatly facilitates its use and readability. In this manner documents whose information content is partially location dependent can be understood despite the fact that the documents' text is written using English language phrases with little or no grammatical structure.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.