Patent · US Expired

Method and apparatus providing capitalization recovery for text

US6922809B2 · kind B2 · utility

10Cited by
2References
17Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 27, 2001
Grant dateJul 26, 2005
Priority date
Expiry dateSep 2, 2023

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/232
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for capitalizing text in a document includes processing a reference corpus to construct a plurality of dictionaries of capitalized terms, where the plurality of dictionaries include a singleton dictionary and a phrase dictionary. Each record in the singleton dictionary contains a word in lowercase, a range of phrase lengths m:n for capitalized phrases that the word begins, where m is a minimum phrase length and n is a maximum phrase length, and where each record in the phrase dictionary includes a multi-word phrase in lowercase. The method adds proper capitalization to an input monocase document by capitalizing words found in mandatory capitalization positions; and by looking up each word in the singleton dictionary and, if the word is found in the singleton dictionary, testing the corresponding phrase length range. If the phrase length range indicates that the word does not start a multi-word phrase, the method capitalizes the word, while if the phrase length range indicates that the word does start a multi-word phrase, the method tests the word and an indicated plurality of next words as a candidate phrase to determine if the candidate phrase is found in the phrase dicti…

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.