Automatically labeling data using natural language processing
US11816741B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 30, 2022 |
| Grant date | Nov 14, 2023 |
| Priority date | — |
| Expiry date | Dec 30, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06Q10/10
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
In an illustrative embodiment, methods and systems for automatically labeling unstructured data include accessing unstructured data representing data entry and analyzing the unstructured data by applying natural language processing to a text component of the unstructured data to obtain a set of term counts of words and/or phrases identified in the text component. Analyzing may include applying at least one clustering algorithm to the set of term counts to determine a term cluster, identifying a preexisting term cluster most closely matching the term cluster, and applying, to the unstructured data, a predefined label corresponding to the preexisting term cluster. The unstructured data may be analyzed to obtain formatting counts of formatting elements, and a formatting cluster may be determined and applied to match to a preexisting formatting cluster, thus deriving a predefined label corresponding to the preexisting formatting cluster.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.