Text mining a dataset of electronic documents to discover terms of interest
US10540444B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 20, 2017 |
| Grant date | Jan 21, 2020 |
| Priority date | — |
| Expiry date | Oct 13, 2037 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N7/01
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method is provided for analyzing and interpreting a dataset composed of electronic documents including free-form text. The method includes text mining the documents for terms of interest, including receiving a set of seed nouns as input to an iterative process an iteration of which includes searching for multiword terms having seed nouns as their head words, at least some of which define a training set of a machine learning algorithm used to identify additional multiword terms at least some of which have nouns outside the set of seed nouns as their head words. The iteration also includes adding the nouns outside the set of seed nouns to the set and thereby identifying a new set of seed nouns for a next iteration. The method includes unifying terms of interest to produce normalized terms of interest for application to generate features of the documents for data analytics performed thereon.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.