Patent · US Active

Method and system of creating and summarizing unstructured natural language sentence clusters for efficient tagging

US11604926B2 · kind B2 · utility

0Cited by
0References
14Claims
0Family size

Inventors

Key dates

Filing dateFeb 21, 2020
Grant dateMar 14, 2023
Priority date
Expiry dateJul 9, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A computerized method for reducing domain noise, creating and summarizing human-written sentences into clusters for efficient tagging in natural language processing comprising: receiving a typed, handwritten or printed text; implementing an optical character recognition (OCR) process on human written text to generate a digital version of the human written text; splitting the digital version of the typed, handwritten or printed text into an array of sentences, using a sentence splitter to generate a split sentence version; determining a domain of the human written text; based on the domain, implementing a domain noise reduction process on the split sentences version; hierarchically clustering the split sentences version after the domain noise reduction process; and summarizing the clustered sentences and reducing the amount of data to be tagged.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.