Hierarchical alignment of character sequences representing text of same source
US8170289B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 21, 2005 |
| Grant date | May 1, 2012 |
| Priority date | — |
| Expiry date | Sep 1, 2027 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/194
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Systems and methods for character-by-character alignment of two character sequences (such as OCR output from a scanned document and an electronic version of the same document) using a Hidden Markov Model (HMM) in a hierarchical fashion are disclosed. The method may include aligning two character sequences utilizing multiple hierarchical levels. For each hierarchical level above a final hierarchical level, the aligning may include parsing character subsequences from the two character sequences, performing an alignment of the character subsequences, and designating aligned character subsequences as the anchors, the parsing and performing the alignment being between the anchors generated from an immediately previous hierarchical level if the current hierarchical level is below the first hierarchical level. For the final hierarchical level, the aligning includes performing a character-by-character alignment of characters between anchors generated from the immediately previous hierarchical level. At each hierarchical level, an HMM may be constructed and Viterbi algorithm may be employed to solve for the alignment.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.