Patent · US Active

Gap identification in corpora

US10095775B1 · kind B1 · utility

1Cited by
6References
1Claims
0Family size

Assignee

Inventors

Key dates

Filing dateFeb 5, 2018
Grant dateOct 9, 2018
Priority date
Expiry dateFeb 5, 2038

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N5/02
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Embodiments of the present invention disclose a method, a computer program product, and a computer system for identifying information gaps in corpora. A computer receives a document and extracts keywords from the document while filtering trivial keywords. The computer identifies and extracts top keywords detailed by the document using a topic modelling approach before determining whether the extracted top keywords exceed a threshold use frequency. Based on determining that the top keywords exceed a threshold use frequency, determining whether the top keywords have a relation to other entities within the document and, if so, determining whether the top keywords are defined within the document. Based on determining that the top keywords are not defined in the document, adding the top keywords to a list and defining the top keywords.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.