Unifying terms of interest from a dataset of electronic documents
US11651001B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 14, 2018 |
| Grant date | May 16, 2023 |
| Priority date | — |
| Expiry date | Feb 28, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F2216/03
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method is provided for analyzing and interpreting a dataset composed of electronic documents including free-form text. The method includes unifying terms of interest in the collection of terms of interest to identify variants of the terms of interest. This includes identifying candidate variants of a term of interest based on semantic similarity between the term of interest and other terms in the database, determined using an unsupervised machine learning algorithm. Linguistic features and contextual features of the term of interest and its candidate variants are extracted, at least the contextual features being extracted using the unsupervised machine learning algorithm. And a supervised machine learning algorithm is used with the linguistic features and contextual features to identify variants of the term of interest from the candidate variants, such as for application to generate features of the documents for data analytics performed thereon.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.