Patent · US Active

System and method for adaptive sentence boundary disambiguation

US8131546B1 · kind B1 · utility

4Cited by
4References
18Claims
0Family size

Assignee

Inventor

Key dates

Filing dateDec 28, 2007
Grant dateMar 6, 2012
Priority date
Expiry dateJan 3, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/183
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Embodiments disclosed herein provide a system and method useful for pre-processing non-sentence text extracted from business documents (e.g., malformed bulleted lists, runaway sentence identification, spatially separated data, etc.). One embodiment includes two heuristic algorithms: one searches for sentences in a document and another looks for non-sentences (e.g., lists, tables, tabs, names of people, addresses, etc.) in the same document. In one embodiment, when malformed text is encountered, a particular character (e.g., “?”) is inserted to signify to a natural language processing layer that this set of “words” represent a logical construct and should be evaluated independent of other sentences. Embodiments disclosed herein allow non-sentence text, which is linguistically dry but contextually rich, be included in the natural language processing. Embodiments disclosed herein also facilitate to reduce false positive concept extraction assertions by the natural language processing layer.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.