System and method for adaptive sentence boundary disambiguation
US8131546B1 · kind B1 · utility
Assignee
Inventor
Key dates
| Filing date | Dec 28, 2007 |
| Grant date | Mar 6, 2012 |
| Priority date | — |
| Expiry date | Jan 3, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/183
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Embodiments disclosed herein provide a system and method useful for pre-processing non-sentence text extracted from business documents (e.g., malformed bulleted lists, runaway sentence identification, spatially separated data, etc.). One embodiment includes two heuristic algorithms: one searches for sentences in a document and another looks for non-sentences (e.g., lists, tables, tabs, names of people, addresses, etc.) in the same document. In one embodiment, when malformed text is encountered, a particular character (e.g., “?”) is inserted to signify to a natural language processing layer that this set of “words” represent a logical construct and should be evaluated independent of other sentences. Embodiments disclosed herein allow non-sentence text, which is linguistically dry but contextually rich, be included in the natural language processing. Embodiments disclosed herein also facilitate to reduce false positive concept extraction assertions by the natural language processing layer.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.