Patent · US Active

Text mining a dataset of electronic documents to discover terms of interest

US10540444B2 · kind B2 · utility

0Cited by

8References

27Claims

0Family size

Assignee

The Boeing Company · US

Inventors

Anne Kao · Bellevue, US
Nobal B. Niraula · Memphis, US
Daniel I. Whyatt · Huntsville, US

Key dates

Filing date	Jun 20, 2017
Grant date	Jan 21, 2020
Priority date	—
Expiry date	Oct 13, 2037

Classification

Technology area (CPC G)Physics
CPC primaryG06N7/01
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method is provided for analyzing and interpreting a dataset composed of electronic documents including free-form text. The method includes text mining the documents for terms of interest, including receiving a set of seed nouns as input to an iterative process an iteration of which includes searching for multiword terms having seed nouns as their head words, at least some of which define a training set of a machine learning algorithm used to identify additional multiword terms at least some of which have nouns outside the set of seed nouns as their head words. The iteration also includes adding the nouns outside the set of seed nouns to the set and thereby identifying a new set of seed nouns for a next iteration. The method includes unifying terms of interest to produce normalized terms of interest for application to generate features of the documents for data analytics performed thereon.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.