Patent · US Expired

Method and system for recovering text from a damaged electronic file

US5964885A · kind A · utility

18Cited by

3References

22Claims

0Family size

Assignee

Microsoft Corporation · US

Inventors

Robert Little · Redmond, US
Stephan Mueller · Celle, DE

Key dates

Filing date	Jul 14, 1997
Grant date	Oct 12, 1999
Priority date	—
Expiry date	Jul 14, 2017

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99953
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Recovering text from a damaged electronic file by scanning an arbitrary stream of bytes and extracting text that is encoded as ASCII or Unicode. A byte of the damaged file is read. The read byte may be interpreted using the ASCII encoding standard. The read byte and the immediately preceding read byte may also be interpreted using the Unicode character encoding standard. The interpreted byte(s) is classified based upon the likelihood that the byte(s) is actually text for the particular character set rather than a control character, damaged data, or an element other than a textual character. The classifications are used to adjust a likelihood counter for each character type. The likelihood counter may be an integer value that indicates the probability that a text run has been detected. A text run is a sequence of bytes that is believed to be undamaged text. Each likelihood counter is then examined to determine whether there is a text run for one of the character types. If there is a text run, then the starting position of the text run is saved. The entire text run is output when the text run ends.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.