Patent · US Active

Identification of reading order text segments with a probabilistic language model

US10372821B2 · kind B2 · utility

5Cited by

11References

16Claims

0Family size

Assignee

Adobe Inc. · US

Inventors

Walter Chang · San Jose, US
Trung Bui · San Jose, US
Pranjal Daga · West Lafayette, US
Michael Kraley · Lexington, US
Hung Bui · Sunnyvale, US

Key dates

Filing date	Mar 17, 2017
Grant date	Aug 6, 2019
Priority date	—
Expiry date	Apr 16, 2037

Classification

Technology area (CPC G)Physics
CPC primaryG06V30/416
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Certain embodiments identify a correct structured reading-order sequence of text segments extracted from a file. A probabilistic language model is generated from a large text corpus to comprise observed word sequence patterns for a given language. The language model measures whether splicing together a first text segment with another continuation text segment results in a phrase that is more likely than a phrase resulting from splicing together the first text segment with other continuation text segments. Sets of text segments, which include a first set with a first text segment and a first continuation text segment as well as a second set with the first text segment and a second continuation text segment, are provided to the probabilistic model. A score indicative of a likelihood of the set providing a correct structured reading-order sequence is obtained for each set of text segments.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.