Patent · US Expired

Method and system for performing phrase/word clustering and cluster merging

US7519590B2 · kind B2 · utility

16Cited by
9References
34Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 9, 2003
Grant dateApr 14, 2009
Priority date
Expiry dateJun 8, 2024

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99938
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Text classification has become an important aspect of information technology. Present text classification techniques range from simple text matching to more complex clustering methods. Clustering describes a process of discovering structure in a collection of characters. The invention automatically analyzes a text string and either updates an existing cluster or creates a new cluster. To that end, the invention may use a character n-gram matching process in addition to other heuristic-based clustering techniques. In the character n-gram matching process, each text string is first normalized using several heuristics. It is then divided into a set of overlapping character n-grams, where n is the number of adjacent characters. If the commonality between the text string and the existing cluster members satisfies a pre-defined threshold, the text string is added to the cluster. If, on the other hand, the commonality does not satisfy the pre-defined threshold, a new cluster may be created. Each cluster may have a selected topic name. The topic name allows whole clusters to be compared in a similar way to the individual clusters, and merged when a predetermined level of commonality exists…

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.