Patent · US Active

Hierarchical alignment of character sequences representing text of same source

US8170289B1 · kind B1 · utility

5Cited by
11References
27Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 21, 2005
Grant dateMay 1, 2012
Priority date
Expiry dateSep 1, 2027

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/194
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Systems and methods for character-by-character alignment of two character sequences (such as OCR output from a scanned document and an electronic version of the same document) using a Hidden Markov Model (HMM) in a hierarchical fashion are disclosed. The method may include aligning two character sequences utilizing multiple hierarchical levels. For each hierarchical level above a final hierarchical level, the aligning may include parsing character subsequences from the two character sequences, performing an alignment of the character subsequences, and designating aligned character subsequences as the anchors, the parsing and performing the alignment being between the anchors generated from an immediately previous hierarchical level if the current hierarchical level is below the first hierarchical level. For the final hierarchical level, the aligning includes performing a character-by-character alignment of characters between anchors generated from the immediately previous hierarchical level. At each hierarchical level, an HMM may be constructed and Viterbi algorithm may be employed to solve for the alignment.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.