Patent · US Expired

Method and system for recovering text from a damaged electronic file

US5964885A · kind A · utility

18Cited by
3References
22Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 14, 1997
Grant dateOct 12, 1999
Priority date
Expiry dateJul 14, 2017

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99953
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Recovering text from a damaged electronic file by scanning an arbitrary stream of bytes and extracting text that is encoded as ASCII or Unicode. A byte of the damaged file is read. The read byte may be interpreted using the ASCII encoding standard. The read byte and the immediately preceding read byte may also be interpreted using the Unicode character encoding standard. The interpreted byte(s) is classified based upon the likelihood that the byte(s) is actually text for the particular character set rather than a control character, damaged data, or an element other than a textual character. The classifications are used to adjust a likelihood counter for each character type. The likelihood counter may be an integer value that indicates the probability that a text run has been detected. A text run is a sequence of bytes that is believed to be undamaged text. Each likelihood counter is then examined to determine whether there is a text run for one of the character types. If there is a text run, then the starting position of the text run is saved. The entire text run is output when the text run ends.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.