Natural language processing of formatted documents
US10628525B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | May 17, 2017 |
| Grant date | Apr 21, 2020 |
| Priority date | — |
| Expiry date | May 17, 2037 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/10
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Detecting and incorporating formatting characteristics within natural language processing analytics. Source documents are ingested and the markup formatting language is identified by the program. Once identified, the markup language is parsed and examined for formatting characteristics, embedded notes, comments and other metadata. The formatting characteristics of the plain text are extracted, along with the plain text, and converted into a common analysis structure (CAS), or CAS-equivalent structure, which annotates the natural language text together with its respective formatting characteristics. The CAS or CAS-equivalent structures are stored and sent to a natural language processing pipeline for further analysis via complex algorithms and rules. The natural language processing results data are curated to reflect meaningful analysis of the extracted CAS or CAS-equivalent structure.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.