Patent · US Active

Document analysis and multi-word term detector

US8090724B1 · kind B1 · utility

51Cited by
14References
23Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 28, 2007
Grant dateJan 3, 2012
Priority date
Expiry dateDec 1, 2029

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/917
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A term analyzer receives an ordered collection of text-based terms. The ordered collection can contain terms from a document that have been filtered to remove “noise” such as stopwords. The term analyzer analyzes groupings of consecutive text-based terms in the ordered collection to identify occurrences of different combinations of text-based terms in the ordered collection. In addition, the term analyzer maintains frequency information representing the occurrences of the different combinations of text-based terms in the collection. The frequency information can then be used to determine relatively significant keywords and/or keyword phrases in the document. In an example configuration, the term analyzer creates a tree in which a first term in a given grouping of the groupings is defined as a parent node in the tree and a second term in the given grouping is defined as a child node of the parent node in the tree. The method of the analyzer generalizes to create a tree of multi-word terms in which the terms can be efficiently ranked by occurrence.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.