Patent · US Active

Natural language processing of formatted documents

US10628525B2 · kind B2 · utility

2Cited by
7References
11Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 17, 2017
Grant dateApr 21, 2020
Priority date
Expiry dateMay 17, 2037

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Detecting and incorporating formatting characteristics within natural language processing analytics. Source documents are ingested and the markup formatting language is identified by the program. Once identified, the markup language is parsed and examined for formatting characteristics, embedded notes, comments and other metadata. The formatting characteristics of the plain text are extracted, along with the plain text, and converted into a common analysis structure (CAS), or CAS-equivalent structure, which annotates the natural language text together with its respective formatting characteristics. The CAS or CAS-equivalent structures are stored and sent to a natural language processing pipeline for further analysis via complex algorithms and rules. The natural language processing results data are curated to reflect meaningful analysis of the extracted CAS or CAS-equivalent structure.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.