Patent · US Active

Automatically labeling data using natural language processing

US11544795B2 · kind B2 · utility

1Cited by
5References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateFeb 7, 2022
Grant dateJan 3, 2023
Priority date
Expiry dateFeb 7, 2042

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06Q10/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

In an illustrative embodiment, methods and systems for automatically labeling unstructured data include accessing unstructured data representing data entry and analyzing the unstructured data by applying natural language processing to a text component of the unstructured data to obtain a set of term counts of words and/or phrases identified in the text component. Analyzing may include applying at least one clustering algorithm to the set of term counts to determine a term cluster, identifying a preexisting term cluster most closely matching the term cluster, and applying, to the unstructured data, a predefined label corresponding to the preexisting term cluster. The unstructured data may be analyzed to obtain formatting counts of formatting elements, and a formatting cluster may be determined and applied to match to a preexisting formatting cluster, thus deriving a predefined label corresponding to the preexisting formatting cluster.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.