Patent · US Active

Unifying terms of interest from a dataset of electronic documents

US11651001B2 · kind B2 · utility

0Cited by
5References
29Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 14, 2018
Grant dateMay 16, 2023
Priority date
Expiry dateFeb 28, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F2216/03
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method is provided for analyzing and interpreting a dataset composed of electronic documents including free-form text. The method includes unifying terms of interest in the collection of terms of interest to identify variants of the terms of interest. This includes identifying candidate variants of a term of interest based on semantic similarity between the term of interest and other terms in the database, determined using an unsupervised machine learning algorithm. Linguistic features and contextual features of the term of interest and its candidate variants are extracted, at least the contextual features being extracted using the unsupervised machine learning algorithm. And a supervised machine learning algorithm is used with the linguistic features and contextual features to identify variants of the term of interest from the candidate variants, such as for application to generate features of the documents for data analytics performed thereon.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.